lucene的分组查询(类似sql的group by)的解决方法
通过lucene搜索去除相同结果。
在网上找了很久到没有答案,到apache看了文档,http://lucene.apache.org/java/2_4_1/queryparsersyntax.html
搜索语法之中是没有类似group by的。只好换个思路,想到了过滤器。
结果发现了org.apache.lucene.search.DuplicateFilter这个类。对此类的解释如下: "Full" processing mode starts by setting all bits to false and only setting bits for documents that contain the given field and are identified as none-duplicates. 这就是说这个过滤器可以保证搜索的唯一。这样就可以实现类似sql的group by(和group by还是有一定区别的,我想要的要求就是去除相同结果,但次方法经过修改也可实现group by 其他功能)。
多的不说了贴个例子自己研究下吧。
public static void main(String [] args) throws Exception{
?? RAMDirectory directory=new RAMDirectory();
?? IndexWriter writer=new IndexWriter(directory,new StandardAnalyzer(),true);
//数组中有3个重复值133700
?? String[] link ={"",
???? "shtml#Ayi:263791429",
???? "133700",
???? "133700",
???? "133700",
???? "#Ayi:468534543",
???? "#Ayi:-992539968",
???? "#Ayi:442193484"};
?? String[] parentLink={"110905.shtml",
???? "110905.shtml",
???? "110905.shtml",
???? "110905.shtml",
???? "905.shtml",
???? "5.shtml",
???? "110905.shtml",
???? "1"};
?? for (int i = 0; i < link.length; i++){
??????????? Document doc = new Document();
??????????? Field fields=new Field("link",link[i], Field.Store.YES, Field.Index.TOKENIZED);
??????????? doc.add(fields);
??????????? fields=null;
??????????? fields=new Field("plink","a"+i, Field.Store.YES, Field.Index.TOKENIZED);
??????????? doc.add(fields);
??????????? writer.addDocument(doc);
??????? }
?? writer.optimize();
?? writer.close();
???? IndexSearcher indexSearcher=new IndexSearcher(directory);
???? QueryParser queryParser=new QueryParser("link",new StandardAnalyzer());
???? String xsfd="link:(133700)";
//实例化DuplicateFilter 参数为想要过滤的字段名
???? Filter filter = new DuplicateFilter("link");
???? Query query=queryParser.parse(xsfd);
???? Hits hits=indexSearcher.search(query,filter);
???? System.out.println(hits.length());
???? for(int j=0;j<hits.length();j++){
???? ?? Document doc=(Document)hits.doc(j);
???? ?? System.out.println(doc.get("link"));
???? }
}
注意:DuplicateFilter在lucene的核心包里并没有在lucene-queries-2.4.1.jar包中,找不到这个包就下个lucene源码在contrib\queries里。哎,还真的不太好发现。