首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 网站开发 > Web前端 >

lucene3.0学习札记(二)index

2012-11-23 
lucene3.0学习笔记(二)index?IndexWriter writer new IndexWriter(FSDirectory.open(E:\\test\\index)

lucene3.0学习笔记(二)index
?lucene3.0学习札记(二)indexlucene3.0学习札记(二)index

    IndexWriter writer = new IndexWriter(FSDirectory.open("E:\\test\\index"),new StandardAnalyzer(Version.LUCENE_CURRENT), true,IndexWriter.MaxFieldLength.LIMITED);

    IndexWriter类的构造函数共四个参数:
    (1).Directory dir:FSDirectory:表示对文件系统目录的操作;RAMDirectory:内存中的目录操作,一般FSDirectory用的较多。
    (2).Analyzer a: 用来对文档进行词法分析和语言处理(StandardAnalyzer对中文分词的支持不是很好)
    (3).boolean b :如果没有该索引就创建,否则覆盖原有索引
    (4).

    2.document的学习

    ??? Document是lucene自己定义的一种文件格式,lucene使用docement来代替对应的物理文件或者保存在数据库中的数据。因此Document只能作为数据源在Lucene中的数据存贮的一种文件形式。
      Document只是负责收集数据源,因为不同的文件可以构建同一个Document。只要用户将不同的文件创建成Document类型的文件,Lucene就能快速找到查找并且使用他们。
      对于一个Document文件,可以同时增加多个Field。Lucene中对于每个数据源是使用Field类来表示的。多个Field组成一个Document,多个Document组成一个索引文件。

    ?lucene3.0学习札记(二)indexlucene3.0学习札记(二)index
      Document doc = new Document();doc.add(new Field("contents", new FileReader(f)));doc.add(new Field("filename", f.getCanonicalPath(), Field.Store.YES,Field.Index.ANALYZED));writer.addDocument(doc);



      3。?lucene3.0学习札记(二)indexlucene3.0学习札记(二)index
        int numIndexed = writer.numDocs();//当前索引中文档的个数writer.optimize();writer.close();


        4。搜索

        搜索过程如下:

        创建IndexSearcher准备进行搜索
        创建Analyer用来对查询语句进行词法分析和语言处理
        创建QueryParser用来对查询语句进行语法分析
        QueryParser调用parser进行语法分析,形成查询语法树,放到Query中
        IndexSearcher调用search对查询语法树Query进行搜索,得到结果TopScoreDocCollector


        ?lucene3.0学习札记(二)indexlucene3.0学习札记(二)index
          IndexSearcher is = new IndexSearcher(FSDirectory.open(indexDir), true);// read-onlyString field = "contents";QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field,new StandardAnalyzer(Version.LUCENE_CURRENT));Query query = parser.parse(q);TopScoreDocCollector collector = TopScoreDocCollector.create(TOP_NUM,false);long start = new Date().getTime();// start timeis.search(query, collector);






          附上例子:
          (1)创建索引
          ?lucene3.0学习札记(二)indexlucene3.0学习札记(二)index
            import java.io.File;import java.io.FileReader;import java.io.IOException;import java.util.Date;import org.apache.lucene.analysis.standard.StandardAnalyzer;import org.apache.lucene.document.Document;import org.apache.lucene.document.Field;import org.apache.lucene.index.IndexWriter;import org.apache.lucene.store.FSDirectory;import org.apache.lucene.util.Version;public class Indexer {private static String INDEX_DIR = "E:\\test\\index";// 索引存放目录private static String DATA_DIR = "E:\\test\\file\";// 文件存放的目录public static void main(String[] args) throws Exception {long start = new Date().getTime();int numIndexed = index(new File(INDEX_DIR), new File(DATA_DIR));// 调用index方法long end = new Date().getTime();System.out.println("Indexing " + numIndexed + " files took "+ (end - start) + " milliseconds");}/** * 索引dataDir下的.txt文件,并储存在indexDir下,返回索引的文件数量 * @param indexDir * @param dataDir * @return int * @throws IOException */public static int index(File indexDir, File dataDir) throws IOException {if (!dataDir.exists() || !dataDir.isDirectory()) {throw new IOException(dataDir+ " does not exist or is not a directory");}IndexWriter writer = new IndexWriter(FSDirectory.open(indexDir),new StandardAnalyzer(Version.LUCENE_CURRENT), true,IndexWriter.MaxFieldLength.LIMITED);indexDirectory(writer, dataDir);// 调用indexDirectory方法int numIndexed = writer.numDocs();//当前索引中文档的个数writer.optimize();writer.close();return numIndexed;}/** * 循环遍历目录下的所有.txt文件并进行索引 * @param writer * @param dir * @throws IOException */private static void indexDirectory(IndexWriter writer, File dir)throws IOException {File[] files = dir.listFiles();for (int i = 0; i < files.length; i++) {File f = files[i];if (f.isDirectory()) {indexDirectory(writer, f); // recurse} else if (f.getName().endsWith(".txt")) {indexFile(writer, f);}}}/** * 对单个txt文件进行索引 * @param writer * @param f * @throws IOException */private static void indexFile(IndexWriter writer, File f)throws IOException {if (f.isHidden() || !f.exists() || !f.canRead()) {return;}System.out.println("Indexing " + f.getCanonicalPath());Document doc = new Document();//针对参数文件建立索引文档doc.add(new Field("contents", new FileReader(f)));doc.add(new Field("filename", f.getCanonicalPath(), Field.Store.YES,Field.Index.ANALYZED));writer.addDocument(doc);//在writer中加入此文档}}


            2。搜索

            ?lucene3.0学习札记(二)indexlucene3.0学习札记(二)index
              import?java.io.File; ??import?java.util.Date; ??import?org.apache.lucene.analysis.standard.StandardAnalyzer; ??import?org.apache.lucene.document.Document; ??import?org.apache.lucene.queryParser.QueryParser; ??import?org.apache.lucene.search.IndexSearcher; ??import?org.apache.lucene.search.Query; ??import?org.apache.lucene.search.ScoreDoc; ??import?org.apache.lucene.search.TopScoreDocCollector; ??import?org.apache.lucene.store.FSDirectory; ??import?org.apache.lucene.util.Version; ????public?class?Searcher?{ ??????private?static?String?INDEX_DIR?=?"E:\\test\\index\";//?索引所在的路径 ??????private?static?String?KEYWORD?=?"接受";//?关键词 ??????private?static?int?TOP_NUM?=?100;//?显示前100条结果 ????????public?static?void?main(String[]?args)?throws?Exception?{ ??????????File?indexDir?=?new?File(INDEX_DIR); ??????????if?(!indexDir.exists()?||?!indexDir.isDirectory())?{ ??????????????throw?new?Exception(indexDir ??????????????????????+?"?does?not?exist?or?is?not?a?directory."); ??????????} ??????????search(indexDir,?KEYWORD);//?调用search方法进行查询 ??????} ????????/** ??????*?查询 ??????*? ??????*?@param?indexDir ??????*?@param?q ??????*?@throws?Exception ??????*/??????public?static?void?search(File?indexDir,?String?q)?throws?Exception?{ ??????????IndexSearcher?is?=?new?IndexSearcher(FSDirectory.open(indexDir),?true);//?read-only ??????????String?field?=?"contents"; ??????????QueryParser?parser?=?new?QueryParser(Version.LUCENE_CURRENT,?field, ??????????????????new?StandardAnalyzer(Version.LUCENE_CURRENT)); ??????????Query?query?=?parser.parse(q); ??????????TopScoreDocCollector?collector?=?TopScoreDocCollector.create(TOP_NUM, ??????????????????false); ??????????long?start?=?new?Date().getTime();//?start?time ??????????is.search(query,?collector); ??????????ScoreDoc[]?hits?=?collector.topDocs().scoreDocs; ??????????System.out.println(hits.length); ??????????for?(int?i?=?0;?i?<?hits.length;?i++)?{ ??????????????Document?doc?=?is.doc(hits[i].doc);//?new?method?is.doc() ??????????????System.out.println(doc.getField("filename")?+?"???"??????????????????????+?hits[i].toString()?+?"?"); ??????????} ??????????long?end?=?new?Date().getTime();//?end?time ??????????System.out.println("Found?"?+?collector.getTotalHits() ??????????????????+?"?document(s)?(in?"?+?(end?-?start) ??????????????????+?"?milliseconds)?that?matched?query?'"?+?q?+?"':"); ??????} ??}??

热点排行