用户名: 密 码:
您现在的位置:首页 >> SEO开发技巧 >> 内容

Sphinx Search搜索分词手记

时间:2009-11-08 11:13:20 点击:7074

  核心提示:Sphinx Search 是由俄罗斯人Andrew Aksyonoff 开发的高性能全文搜索软件包Coreseek 在Sphinx的基础上,对Sphinx 的中文支持进行增强,详情见:http://www.coreseek.cn/products/ft_feature/LibMMSeg 是Core...

Sphinx Search 是由俄罗斯人Andrew Aksyonoff 开发的高性能全文搜索软件包

Coreseek 在Sphinx的基础上,对Sphinx 的中文支持进行增强,详情见:
http://www.coreseek.cn/products/ft_feature/

LibMMSeg 是Coreseek.com为 Sphinx 全文搜索引擎设计的中文分词软件包,其在GPL协议下发行的中文分词法,采用Chih-Hao Tsai的MMSEG算法。

LibMMSeg 采用C++开发,同时支持Linux平台和Windows平台,切分速度大约在300K/s(PM-1.2G),截至当前版本(0.7.1)LibMMSeg没有为速度仔细优化过,进一步的提升切分速度应仍有空间。

安装配置:

一、下载文件
http://www.coreseek.cn/uploads/csft/3.1/Source/csft-3.1.tar.gz
http://www.coreseek.cn/uploads/csft/3.1/Source/mmseg-3.1.tar.gz
http://www.coreseek.com/uploads/sources/coreseek_fulltext_2.5.tar.gz
ps:coreseek_fulltext_2.5.tar.gz这个字典压缩包好像下不到了,不过还好以前下的有留下来,有需要的可找我。

二、安装

yum install -y python python-devel glibc-devel expat expat-devel gcc-c++

tar zxvf mmseg-3.1.tar.gz
cd mmseg-3.1
./configure --prefix=/usr/local/mmseg
make
make install
cp /usr/local/mmseg/include/mmseg/csr_typedefs.h /usr/local/include/mmseg

tar zxvf csft-3.1.tar.gz
./configure --prefix=/usr/local/csft --with-python --with-mysql=/usr/local/mysql/ --with-mysql-includes=/usr/local/mysql/include --with-mysql-libs=/usr/local/mysql/lib --with-mmseg-includes=/usr/local/mmseg/include/mmseg --with-mmseg-libs=/usr/local/mmseg/lib/
make
make install

tar zxvf coreseek_fulltext_2.5.2.tar.gz
cp -a coreseek_fulltext_2.5.source/dict/ /usr/local/csft/

cd /usr/local/csft/etc/
vi sphinx.conf #修改配置,此过程可见默认配置文件里的说明。下面会给出一个配置的示例

启动索引时出错:
===========error============
/usr/local/csft/bin/indexer: error while loading shared libraries: libmysqlclient.so.16: cannot open shared object file: No such file or directory
解决:ln -s /usr/local/mysql/lib/libmysqlclient.so.16 /usr/lib/libmysqlclient.so.16
==============================

创建全文索引:
/usr/local/csft/bin/indexer --config /usr/local/csft/etc/sphinx.conf --all #创建所有索引
/usr/local/csft/bin/indexer --config /usr/local/csft/etc/sphinx.conf game #创建单个索引
更新索引:
/usr/local/csft/bin/indexer --config /usr/local/csft/etc/sphinx.conf game --rotate
/usr/local/csft/bin/indexer --config /usr/local/csft/etc/sphinx.conf --rotate --all
运行守护进程searchd:
/usr/local/csft/bin/searchd --config /usr/local/csft/etc/sphinx.conf

下面给出sphinx.conf 配置示例:

source game
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass = 123
sql_db = mydb
sql_port = 3306
sql_query_pre = SET NAMES gbk
sql_query = \
SELECT id, gid, net, cid, name, type, osid, hits_total,hits_today,hits_week, stat, UNIX_TIMESTAMP(entered) AS date_added \
FROM gz_game_item
sql_attr_uint = gid
sql_attr_uint = net
sql_attr_uint = cid
sql_attr_uint = type
sql_attr_uint = osid
sql_attr_uint = hits_total
sql_attr_uint = hits_today
sql_attr_uint = hits_week
sql_attr_uint = stat
sql_attr_timestamp = date_added
sql_ranged_throttle = 0
}

index game
{
source = game
path = /usr/local/csft/var/data/game
docinfo = extern
mlock = 0
morphology = none
min_word_len = 1
charset_type = zh_cn.gbk
charset_dictpath = /usr/local/csft/dict
min_prefix_len = 0
min_infix_len = 0
html_strip = 0
}
indexer
{
mem_limit = 32M
}

searchd
{

listen = 192.168.1.155:3312
log = /usr/local/csft/var/log/searchd.log
query_log = /usr/local/csft/var/log/query.log
read_timeout = 5
max_children = 30
pid_file = /usr/local/csft/var/log/searchd.pid
max_matches = 1000
seamless_rotate = 1
}

文章来源:http://www.xinxilong.com

作者:不详 来源:网络
相关评论
发表我的评论
  • 大名:
  • 内容:
  • 论坛群发工具(www.xinxilong.com) © 2008 版权所有 All Rights Resverved.
  • Email:433168@qq.com 沪ICP备12025887号
  • Powered by 论坛群发大师