摘要
针对传统搜索引擎可读性差的状况,在研究搜索引擎原理和聚类算法的基础上,对聚类搜索引擎的体系结构,以及应用于网页聚类的Lingo聚类算法进行了详细探讨。实现了适用于中文的Web搜索结果自动聚类系统。在接口设计和可扩展性设计上,充分考虑到中文环境的特殊性,做了十分有意义的工作。通过试验对比表明采用描述优先的聚类算法对提高系统聚类结果的可读性和可理解性都有很大帮助。
With a study on the search engine principle and the clustering algorithms, this thesis discussed the architecture of clustering search engine, and the Lingo(Label INduction Grouping algOrithm) algorithm. This paper have implemented an automatic clustering system for web pages in Chinese. As in the environment of Chinese, it considers a lot to design the system architecture to fit for it. the thesis successfully implements the descriptive clustering algorithm which makes the result labels much more readable. The comparison shows that the priorities of descriptive clustering algorithm to enhance labeling understandability.
出处
《心智与计算》
2010年第4期250-257,共8页
Mind and Computation
关键词
聚类算法
搜索引擎
描述优先
Lingo算法
clustering algorithm
search engine
description quality first
Lingo algorithm