首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > 编程 >

lucene3运用示例

2012-09-08 
lucene3使用示例lucene插入document建立索引代码import java.io.Fileimport java.io.FileInputStreamimp

lucene3使用示例

lucene插入document建立索引代码

import java.io.File;import java.io.FileInputStream;import java.io.IOException;import java.io.InputStreamReader;import java.util.List;import org.apache.lucene.analysis.Analyzer;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.document.Field;import org.apache.lucene.index.CorruptIndexException;import org.apache.lucene.index.IndexWriter;import org.apache.lucene.store.Directory;import org.apache.lucene.store.FSDirectory;import org.apache.lucene.store.LockObtainFailedException;import org.apache.lucene.util.Version;public class DocInsert {private static IndexWriter indexwrite = null;static{ Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);    // Store the index in memory://    Directory directory = new RAMDirectory();    // To store an index on disk, use this instead:    try {    Directory directory = FSDirectory.open(new File("E:\\output\\lucence\\index"));    indexwrite = new IndexWriter(directory, analyzer, true,                                      new IndexWriter.MaxFieldLength(25000));} catch (CorruptIndexException e) {// TODO Auto-generated catch blocke.printStackTrace();} catch (LockObtainFailedException e) {// TODO Auto-generated catch blocke.printStackTrace();} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}    }public static void createDoc() throws CorruptIndexException, IOException{List<String> datalist = org.apache.commons.io.IOUtils.readLines(new InputStreamReader(new FileInputStream(new File("E:\\output\\lucence\\data\\data.txt")),"GBK"));    for(String str:datalist){Document doc = new Document();    String[] text = str.split("\t");    if(text.length < 2){    continue;    }    doc.add(new Field("context", text[1], Field.Store.YES, Field.Index.ANALYZED));    doc.add(new Field("id", text[0], Field.Store.YES, Field.Index.ANALYZED));    indexwrite.addDocument(doc);    }}public static void main(String[] args) throws CorruptIndexException, IOException {createDoc();indexwrite.commit();indexwrite.close();}}

?

数据格式为:

??4915779球泡灯套件

491577715018506651求购三星i559 i569 i5794915775采购雪纺格子印花面料4915773汽泡信封袋4915771电泳加工49157696405 2RS4915767蓝色丁腈手套4915765采购求购KO3-15T八角4915763胸杯4915761封箱胶带49157596404 2RS4915757Ipad 车载支架4915755礼品,文具,墙贴,基督教礼品4915753品牌内衣4915751聚丙烯酸4915749餐饮消毒毛巾、湿巾4915747提花49157456403 2RS4915743采购如:葛根粉丝、蕨根粉丝、南瓜粉丝、野菜粉丝、香菇粉丝等4915741二手摩托车4915739急需采购PVC特殊袋子4915737女士T恤4915735烤弯镀膜玻璃4915731批发野生羊肚菌4915733ABS管道粘结剂 ABS胶

?

检索代码示例如下:

import java.io.File;import java.io.IOException;import org.apache.lucene.analysis.Analyzer;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.index.IndexReader;import org.apache.lucene.queryParser.ParseException;import org.apache.lucene.queryParser.QueryParser;import org.apache.lucene.search.IndexSearcher;import org.apache.lucene.search.Query;import org.apache.lucene.search.ScoreDoc;import org.apache.lucene.store.Directory;import org.apache.lucene.store.FSDirectory;import org.apache.lucene.util.Version;public class DocSearch {private static IndexSearcher isearcher = null;public static void search(String key) throws IOException, ParseException{ Directory directory = FSDirectory.open(new File("E:\\output\\lucence\\index")); // Now search the index:    IndexReader ireader = IndexReader.open(directory); // read-only=true    isearcher  = new IndexSearcher(ireader);    // Parse a simple query that searches for "text":    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);    QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,"context", analyzer);    Query query = parser.parse(key);    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;    // Iterate through the results:    for (int i = 0; i < hits.length; i++) {      Document hitDoc = isearcher.doc(hits[i].doc);      System.out.println(hitDoc.getValues("id")[0] + "\t" + hitDoc.getValues("context")[0] + "\t" + hits[i].score);          }}public static void main(String[] args) throws IOException, ParseException {search("旧水泥袋");isearcher.close();}}

?执行结果:

??4801857采购旧编织袋、旧水泥袋4.0172114

4829927水泥1.75855854903199采购水泥电阻1.05513514815595求购水泥输送链条和提升机0.7034234448612331万5 潜水料啤酒手提包 手提袋0.479820884815637大量采购包装用的编织袋(新的旧的,有无商标皆可)0.479132624915391铁泥 铁灰0.462506354889169废旧砂轮0.399939724903163软陶泥,超轻粘土0.346879784801611水泵0.301146334801911手袋0.298629764889443水锈石 上水石  吸水石0.26080044861275足浴袋  泡脚袋 异形袋0.258620954801871手提袋制袋机0.253395744915383回收库存废旧油墨油漆0.249962334903189回收库存旧油漆134630485720.249962334903187求购废旧油漆油墨134630485720.249962334903175求购库存旧化工树脂0.249962334903245污水泵0.240917074801705出水霜0.240917074874727服裝紙袋0.23890384829965工作证袋0.23890384815531棉布袋0.23890384815479冷敷冰袋0.2389038

?

?

可以看到这个检索结果:

1.默认的分词是最终分成一个汉字,

2.匹配出来的分数还是比较靠谱。

?

?

?

如何用lucene设计一个搜索引擎如何考虑的问题太多:

1.如何设计一个分布式查询;

2.数据增量更新,全量更新如何处理,不影响当前的查询引擎;

3.性能如何保证,更好地利用缓存,分布式?

4.如果设计得更通用,需要添加字段,添加排序字段,统计字段的时候能够做到快速满足需求?

5.分词模块的选择和处理

。。。

?

后续慢慢研究

热点排行