期刊文献+

Web搜索结果多层聚类方法研究 被引量:1

Research on Multi-level Clustering for Web Search Results
下载PDF
导出
摘要 为了便于用户浏览搜索引擎返回结果,本文提出了一种基于TFIDF新的文本相似度计算方法,并提出使用具有近似线性时间复杂度的增量聚类算法对文本进行多层聚类的策略。同时,提出了一种从多文本中提取关键词的策略:提取簇中的名词或名词短语作为候选关键词,综合考虑每个候选关键词的词频、出现位置、长度和文本长度设置加权函数来计算其权重,不需要人工干预以及语料库的协助,自动提取权重最大的候选关键词作为类别关键词。在收集的百度、ODP语料以及公开测试的实验结果表明本文提出方法的有效性。 In order to facilitate the browse of the search results produced by search engines,this paper proposed a TFIDF-based new method to calculate the similarity of the documents and Web search results multi-level clustering by using one-pass clustering algorithm with linear time complexity.At the same time,we proposed a strategy to extract cluster keyword from multi-texts:selected noun or noun phrase as candidate cluster keywords,and took term frequency,the position of term occurring,the length of term and text into consideration to set a weighting function to compute every words weights of the search results,then automatically extracted the weightiest candidate keyword for each cluster generated by multi-level clustering without the intervene of human and the assistance of corpus.Experimental results on Baidu,ODP corpus and user investigation show the efficient and acceptance of our algorithm.
出处 《情报学报》 CSSCI 北大核心 2011年第5期464-470,共7页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金项目(60673191) 广东省自然科学基金项目(9151026005000002) 广东省高等学校自然科学研究重点项目(06Z012)
关键词 文本聚类 多层聚类 类别关键词提取 加权函数 text clustering multi-level clustering extracting keyword weighting function
  • 相关文献

参考文献16

  • 1Bollacker K D, Lawrence S, Giles C L. Discovering relevant scientific literature on the Web [ J ]. IEEE Intelligent Systems ,2000,15 ( 2 ) :42-47.
  • 2Zamir O E. Grouper: a dynamic clustering interface to Web search results [ J]. Computer Networks, 1999,31 (1) :1361-1374.
  • 3Zhang D, Dong Y. Semantic, hierarchical, online clustering of Web search results [ C ]//Proceedings of APWEB-04, 6th Asia-Pacific Web Conference, 2004: 69-78.
  • 4Osinski S, Stefanowski J, Weiss D. Lingo: search results clustering algorithm based on singular value decomposition [ C ]//Proceedings of Intelligent Information Systems Conference. 2003.
  • 5Koshman S,Spink A, Jansen B J. Web Searching on the Vivisimo Search Engine [ J]. Journal of The American Society for Information Science and Technology,2006,57 (14) :1875-1887.
  • 6冯晋,李春平.基于统计学和语义信息的中文文本主题识别技术[J].清华大学学报(自然科学版),2005,45(S1):1791-1794. 被引量:6
  • 7张清军,朱才连.基于统计的中文文本主题自动提取研究[J].四川大学学报(工程科学版),2004,36(3):97-100. 被引量:7
  • 8罗准辰,王挺.基于分离模型的中文关键词提取算法研究[J].中文信息学报,2009,23(1):63-70. 被引量:11
  • 9Jiang S Y, Song X Y. A clustering-based method for unsupervised intrusion detections [ J ]. Pattern Recognition Letters,2006,5 : 802-810.
  • 10Stoica E, Hearst M, Richardson M. Automating Creation of Hierarchical Faceted Metadata Structures [ C ]/! Proceedings of NAACL HLT, 2007.

二级参考文献34

共引文献38

同被引文献14

引证文献1

二级引证文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部