摘要
设计了一种在中英文环境下、能够对Nutch的搜索结果进行聚类处理的搜索结果聚类系统,该系统基于k-means算法和后缀树聚类算法,是一个由Nutch搜索引擎、文本分词、TF-IDF权重计算以及文本聚类等模块构成的搜索引擎结果文档聚类系统,并通过实验对k-means算法和后缀树算法进行了对比。
A search results clustering system which can be able to search cluster results obtained from Nutch is designed both in English and Chinese language environment.This system is based on k-means algorithm and suffix tree clustering algorithm and is made of Nutch module,TF-IDF weight calculation module and text clustering module.The k-means algorithm and suffix tree clustering algorithm are contrasted based on the experiments.
出处
《计算机工程与应用》
CSCD
北大核心
2011年第5期118-122,共5页
Computer Engineering and Applications