摘要
开源搜索引擎Nutch是针对英文环境开发的,针对Nutch对中文进行单字切分的不足,在剖析Nutch分词器的基础上,基于Nutch的插件机制,结合中科院的中文分词系统ICTCLAS,成功地实现了Nutch对中文的词语切分,完善了基于Nutch的中文搜索引擎的开发。
Open source search engine Nutch is developed for English environment, which only segments Chinese into a single character. To solve this problem, the paper analyzes the linguistic analysis structure and plug-in mechanism of Nutch, combining Chinese lexical analysis system ICTCLAS of the Chinese academy of science, realizes the Chinese word seonentation of Nutch and prefects the development of Chinese search engine based on Nutch.
出处
《信息技术》
2010年第2期97-100,103,共5页
Information Technology