摘要
搜索引擎大多以文档列表的形式将搜索结果显示给用户,随着Web文档数量的剧增,使得用户查找相关信息变得越来越困难,一种解决方法是对搜索结果进行聚类提高其可浏览性。搜索引擎的聚类浏览技术能使用户在更高的主题层次上查看搜索结果,方便地找到感兴趣的信息。本文介绍了搜索引擎的聚类浏览技术对聚类算法的基本要求及其分类方法,研究分析了主要聚类算法及其改进方法的特点,讨论了对聚类质量的评价,最后指出了聚类浏览技术的发展趋势。
Most search engines return ranked lists of document snippets, which makes the user difficult to find the relevant information. One method is that the snippets returned by the search engine are grouped into clusters, which may help the user quickly and efficiently navigate the results of a query at a topic level and locate the relevant information. This paper first introduces, several key requirements for Web research results clustering methods and the classification of the clustering methods. Then it probes into the major clustering algorithms and their improved method at present, and discusses the evaluation of,clustering quality. Finally, this paper summarizes the future developments of clustering search engine results.
出处
《中文信息学报》
CSCD
北大核心
2008年第3期56-63,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60603098)
关键词
计算机应用
中文信息处理
搜索引擎
文档聚类
信息检索
聚类标识
computer application
Chinese information processing
search engine
document clustering
information retrieval
cluster label