首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 其他教程 > 开源软件 >

coreseek 汉语言检索初探

2012-07-16 
coreseek 中文检索初探官网:http://www.coreseek.cn。官网的说明是入门最好的资料。1、下载coreseekhttp://ww

coreseek 中文检索初探
官网:http://www.coreseek.cn。官网的说明是入门最好的资料。

1、下载coreseek
http://www.coreseek.cn/news/14/65/
2、安装
[stevelee@liyuanchun share]$ cd /usr/local/share
[stevelee@liyuanchun share]$ sudo mv ~/下载/coreseek-4.1-beta.tar.gz ./
[stevelee@liyuanchun coreseek-4.1-beta]$ cd coreseek-4.1-beta
##安装mmseg
[stevelee@liyuanchun mmseg-3.2.14]$ ./bootstrap
[stevelee@liyuanchun mmseg-3.2.14]$ ./configure --prefix=/usr/local/share/mmseg3
提示错误:config.status: error: cannot find input file: `src/Makefile.in'
通过网上参考资料,输入以下命令后就解决了:
[stevelee@liyuanchun mmseg-3.2.14]$ aclocal
[stevelee@liyuanchun mmseg-3.2.14]$ libtoolize –force
[stevelee@liyuanchun mmseg-3.2.14]$ automake –add-missing
[stevelee@liyuanchun mmseg-3.2.14]$ autoconf
[stevelee@liyuanchun mmseg-3.2.14]$ autoheader
[stevelee@liyuanchun mmseg-3.2.14]$ make clean
[stevelee@liyuanchun mmseg-3.2.14]$ ./configure --prefix=/usr/local/share/mmseg3
[stevelee@liyuanchun mmseg-3.2.14]$  make
[stevelee@liyuanchun mmseg-3.2.14]$  sudo make install
##安装coreseek
[stevelee@liyuanchun mmseg-3.2.14]$  cd ../cd csft-4.1/
[stevelee@liyuanchun csft-4.1]$  sh buildconf.sh
[stevelee@liyuanchun csft-4.1]$  ./configure --prefix=/usr/local/share/coreseek  --without-unixodbc --with-mmseg --with-mmseg-includes=/usr/local/share/mmseg3/include/mmseg/ --with-mmseg-libs=/usr/local/share/mmseg3/lib/ --with-mysql
[stevelee@liyuanchun csft-4.1]$  make
[stevelee@liyuanchun csft-4.1]$  sudo make install

3、测试coreseek
[stevelee@liyuanchun csft-4.1]$  cd ../testpack
[stevelee@liyuanchun testpack]$  cat var/test/test.xml
[stevelee@liyuanchun testpack]$  /usr/local/share/mmseg3/bin/mmseg -d /usr/local/share/mmseg3/etc var/test/test.xml

输入上述命令后应该能正确显示中文,修改 etc/csft.conf的配置文件,将所有相对路径改成实际绝对路径
xmlpipe_command = cat /usr/local/share/coreseek-4.1-beta/testpack/var/test/test.xml
path            = /usr/local/share/coreseek-4.1-beta/testpack/var/data/xml
charset_dictpath= /usr/local/share/mmseg3/etc/
pid_file = /usr/local/share/coreseek/coreseek-4.1-beta/testpack/var/log/searchd_xml.pid
log = /usr/local/share/coreseek/coreseek-4.1-beta/testpack/var/log/searchd_xml.log
query_log = /usr/local/share/coreseek/coreseek-4.1-beta/testpack/var/log/query_xml.log

[stevelee@liyuanchun mmseg-3.2.14]$  /usr/local/share/coreseek/bin/indexer -c etc/csft.conf --all
PS:如果出现ERROR: index 'xml': failed to configure some of the sources的错误提示,需要安装expat(xml解析库)的安装包,我是fedora 15,输入以下命令即可,并且安装完expat之后,需要重新编译coreseek:
[stevelee@liyuanchun testpack]$  sudo  yum install  expat-devel*
[stevelee@liyuanchun testpack]$  cd ../csft-4.1
[stevelee@liyuanchun csft-4.1]$  sudo make clean
重复 2、安装部分的##安装coreseek的部分即可
[stevelee@liyuanchun testpack]$  /usr/local/share/coreseek/bin/indexer -c etc/csft.conf –all
[stevelee@liyuanchun testpack]$  /usr/local/coreseek/bin/search -c etc/csft.conf 网络搜索

4、MYSQL数据库中文搜索测试
[stevelee@liyuanchun testpack]$  cd /usr/local/share/coreseek
[stevelee@liyuanchun coreseek]$  sudo cp  ./etc/sphinx.conf.dist  ./etc/sphinx.conf
[stevelee@liyuanchun coreseek]$  sudo vi ./etc/sphinx.conf
修改source 部分:
source document_content_src
sql_user                = app
sql_pass                = 12345###注意,密码不能有特殊符号,原来的密码是12345 %$#@! ,###建立索引报错:ERROR: index 'posts_content_index': ###sql_connect: Access denied for user '###app'@'localhost' (using password: YES) ###(DSN=mysql://app:***@localhost:3306/futureWeb_allan).
sql_db                  = test
sql_query_pre           = SET NAMES utf8


修改index部分:
index document_content_index
source                  = document_content_src
path                    = /usr/local/share/coreseek/var/data/document_content_src
docinfo                 = extern
mlock                   = 0
morphology              = none


# stopwords
min_word_len            = 1
charset_type            = zh_cn.utf-8
charset_dictpath = /usr/local/share/mmseg3/etc/
ngram_len               = 0
html_strip              = 0

导入数据:在mysql命令行中输入:
mysql> source /home/stevelee/example-chiese (主键为integer).sql


创建索引之前 先看看查询的守护进程是否存在
[stevelee@liyuanchun coreseek]$ sudo ps -ef |grep searchd
如果有记录,停止该进程(kill也可以)
[stevelee@liyuanchun coreseek]$ sudo ./bin/searchd -c /usr/local/share/coreseek/etc/sphinx.conf --stop
[stevelee@liyuanchun coreseek]$ sudo /usr/local/share/coreseek/bin/indexer --config /usr/local/share/coreseek/etc/sphinx.conf –all
索引创建成功后,开启查询的守护进程
[stevelee@liyuanchun coreseek]$ sudo /usr/local/share/coreseek/bin/searchd -c /usr/local/share/coreseek/etc/sphinx.conf
检索测试
   [stevelee@liyuanchun coreseek]$  /usr/local/share/coreseek/bin/search -c /usr/local/share/coreseek/etc/sphinx.conf 我
输出结果
using config file '/usr/local/share/coreseek/etc/sphinx.conf'...

index 'document_content_index': query '我 ': returned 2 matches of 2 total in 0.000 sec

热点排行