摘要
通过对新闻行业进行分析,针对新闻网站对信息要求的特征,研究相关的中文分词算法以及全文检索框架,并设计了一个能够多线程进行数据采集和检索的垂直搜索引擎,然后通过盘古分词组件与Lucene搭建了一个高效的检索系统。系统通过中小型新闻网站的测试运行能够达到搜索引擎对信息查询准确性以及高效响应速度的要求,有较强的处理,改善了用户体验。
A multi-threaded vertical search engine analysing news industry and the characteristics of the for data collection and retrieval is designed through news websites' requirements towards news Information and by studying Chinese word segmentation algorithm and the full-text retrieval framework. An efficient full text retrieval system is built as well utilizing Lucene and Pangu sub-word components. Through the test runs involving small news sites, it is indicated that having relatively power capacity, the system can meet the requirements regarding both the accuracy of information inquiries and the efficiency in response speed and help to improve user experience.
出处
《丽水学院学报》
2012年第5期66-69,共4页
Journal of Lishui University
关键词
新闻网站
盘古分词
检索系统
垂直搜索引擎
news websites
Pangu segment
search system
vertical search engine