首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 其他教程 > 开源软件 >

Solr 同义词搜寻 synonyms

2013-11-14 
Solr 同义词搜索 synonymsSolr同义词搜索是一个很好的功能实现,解决了产品需求中很大的问题,如:搜索用户搜

Solr 同义词搜索 synonyms

Solr同义词搜索是一个很好的功能实现,解决了产品需求中很大的问题,如:搜索用户搜索"刮胡刀" 更好的展示结果是把 "刮胡刀"跟"剃须刀"都显示给用户,这样就可以达到更好的效果。下面讲下具体实现: solr.SynonymFilterFactory

Creates SynonymFilter

Matches strings of tokens and replaces them with other strings of tokens.

  1. The synonyms parameter names an external file defining the synonyms.
  2. If ignoreCase is true, matching will lowercase before checking equality.
  3. If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list.
  4. The optional tokenizerFactory parameter names a tokenizer factory class to analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319 ), which can help with the synonym+stemming problem described in http://search-lucene.com/m/hg9ri2mDvGk1 .

schema.xml配置
<fieldTypename="text"class="solr.TextField"positionIncrementGap="100"><analyzertype="index"><tokenizerclass="solr.ChineseTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"tokenizerFactory="solr.ChineseTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"enablePositionIncrements="true"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="1"catenateNumbers="1"catenateAll="0"splitOnCaseChange="0"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.RemoveDuplicatesTokenFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.ChineseTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"tokenizerFactory="solr.ChineseTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"enablePositionIncrements="true"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="0"catenateNumbers="0"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.RemoveDuplicatesTokenFilterFactory"/></analyzer></fieldType>

synonyms.txt配置
# blank lines and lines starting with pound are comments.  #Explicit mappings match any token sequence on the LHS of "=>"#and replace with all alternatives on the RHS.  These types of mappings  #ignore the expand parameter in the schema.  #Examples:  #-----------------------------------------------------------------------  #some test synonym mappings unlikely to appear in real input text  aaafoo => aaabar  bbbfoo => bbbfoo bbbbar  cccfoo => cccbar cccbaz  fooaaa,baraaa,bazaaa  # Some synonym groups specific to this example  GB,gib,gigabyte,gigabytes  MB,mib,megabyte,megabytes  Television,Televisions, TV,TVs#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming  #after us won't split it into two words.  飞利浦刮胡刀,飞利浦剃须刀# Synonym mappings can be used for spelling correction too  pixima => pixma  a\,a => b\,b  

热点排行