摘要
基于K-center和信息增益的概念,将改进后的FPF(furthest-point-first)算法用于Web搜索结果聚类,提出了聚类标志方法,使得聚类呈现出的结果更易于用户理解,给出了评价聚类质量的模型。将该算法与Lingo,K-means算法进行比较,其结果表明,本算法能够较好地平衡聚类质量和速度,更加适用于Web检索聚类。
Based on K-center and information gain, this paper represented a version of modified FPF algorithm and cluster labeling algorithm on Web search clustering, made the result better understood. At last, presented a simple and intuitionistic criterion NMI for estimating cluster quality. The proposed solution was evaluated in search results returned from actual Web search engine and compared with other methods, like Lingo, K-means. The result proves that the algorithm can balance better clustering time and quality, and meets the requirements of Web searching clustering.
出处
《计算机应用研究》
CSCD
北大核心
2008年第10期3125-3127,共3页
Application Research of Computers
基金
国家"863"计划资助项目(2004AA1Z2520)
军队网络互联与信息安全策略研究资助项目(2006QB1069)