摘要
目前,搜索结果聚类方法大多数采用基于文档的方法,不能生成有意义的聚类标签。为了解决这个问题,提出一种基于关键名词短语聚类的中文搜索结果聚类方法,该方法将名词短语、相关搜索词作为候选聚类标签,利用C-Value算法、IDF值筛选标签,然后使用Chameleon算法将标签聚类,最后将搜索结果划分到最相关的聚类簇。实验证明,该方法把关键名词短语和相关搜索词作为聚类标签,有效地提高了标签的描述性,降低了聚类算法的时间复杂度。
Nowadays,the conventional search result clustering methods employ the document-based approach and can not generate clusters with highly readable names.To solve the problem,based on key noun phrase clustering,this paper proposes a method for Chinese search result clustering.First is to extract key phrases from search results,and use the phrases of correlative search as addition.Second is a new label selecting criterion based on C-Value algorithm and the value of IDF.The third is clustering the labels by Chameleon algorithm.Finally,the search result classification has been perfermed in terms of the results of label clustering.The experiment shows that using key noun phrases and the phrases of correlative search as clustering labels can improve the description of labels and reduce the computation complexity of clustering algorithm.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第31期118-121,共4页
Computer Engineering and Applications
基金
国家高技术研究发展计划(863)No.2006AA010105
国家自然科学基金No.60772081
北京市属市管高校人才强教计划项目(No.PXM2007_014224_044677
No.PXM2007_014224_044676)
北京市教委科技发展计划项目(No.KM200710772010)~~