摘要
对检索结果聚类可以方便用户快速浏览搜索引擎返回结果。为了提取主题表达能力和可读性强的类别标签,获取高质量的聚类结果,提出基于名词短语的检索结果多层聚类方法:提取名词短语作为候选类别标签,根据候选类别标签分布情况生成基础类,再使用具有线性时间复杂度的一趟聚类算法对基础类进行多层聚类。与基于命名实体的方法、STC和Lingo算法的对比实验表明:提出方法在类别标签的可读性、有效性以及聚类性能上都优于以上3种方法。
Clustering search results can facilitate users to browsing the results quickly.In order to select much informative,readable cluster labels and get high qualitative clustering results,a multi-level clustering approach based on noun phrases(MCNP) was proposed for search results.Firstly,select noun phrases as candidate cluster labels and generates basic clusters based on the distribution of candidate cluster labels.Secondly,proceed multi-level clustering on basic clusters using a one pass clustering algorithm with linear time complexity.Finally,comparative study was carried out with name entities based method,STC and Lingo search results clustering algorithms,and the results demonstrated that our approach could get much more informative,readable cluster labels and was more effective than the above three methods.
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2010年第7期39-44,49,共7页
Journal of Shandong University(Natural Science)
基金
国家自然科学基金资助项目(60673191)
广东省高等学校自然科学研究重点项目(06Z012)
广东省自然科学基金资助项目(9151026005000002)
关键词
信息检索
检索结果聚类
文本聚类
多层聚类
information retrieval
search results clustering
text clustering
multi-level clustering