期刊文献+

一种基于后缀树的Web搜索结果聚类方法 被引量:5

A Clustering Method for Web Search Results Based on Suffix Tree
下载PDF
导出
摘要 为同时满足Web搜索结果聚类的关联性、快速性以及类别描述的可浏览性等需求,本文提出了一种适合中文Web信息搜索结果的后缀树聚类算法,其中后缀树的构建以中文汉字为基本单位,一种有效的策略解决了基于二进制方法合并短语类后的类别描述问题,利用短语类语义层面的相似性合并同义短语类,有效地改善了聚类结果的质量。测试结果表明:与传统的文档聚类算法相比,基于后缀树的算法在Web文档聚类的精度和效率方面具有较强的优越性。 In order to satisfy the key requirements for Web document clustering, including relevance, speed, browseable summaries and so on, a method called Suffix Tree Clustering (STC) algorithm is proposed for Web search results clustering in Chinese context. The suffix tree in the paper is built in terms of Chinese words. An effective strategy is introduced into solving the problem of cluster description for cluster merging based on the binary similarity measure, and also similar phrase clusters are merged based on the semantic similarity calculation to improve the quality of clusters. Experiments show that the proposed STC algorithm has a better performance in both precision and speed than traditional document clustering algorithms.
出处 《情报学报》 CSSCI 北大核心 2010年第1期78-83,共6页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金资助项目(70771019).
关键词 WEB搜索 后缀树 文档聚类 Web search, suffix tree, document clustering
  • 相关文献

参考文献10

二级参考文献22

  • 1王映,常毅,谭建龙,白硕.基于N元汉字串模型的文本表示和实时分类的研究与实现[J].计算机工程与应用,2005,41(5):88-91. 被引量:5
  • 2Agrawal R, Srikant R. Mining Sequential Patterns. In: Proceedings of 11th International Conference on Data Engineering Taipei, Taiwan,IEEEComputer Society press, Silver Spring, 1995-03
  • 3Chen M S, Park J S, Yu P S. Efficient Data Mining for Path Travsersal Patems. IEEE Trans. Knowledge Data Engineer, 1998,10(2): 209-211
  • 4Pei J, Hah J, Mortazavi B, et al. Mining Access Patterns Efficiently from Web Logs. In: Proceedings 2000 Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan(PAKDD00),2000-04
  • 5Spiliopoulou M. Web Usage Mining for Web Site Evaluation.Commun., ACM, 2000, 43(8): 127-134
  • 6Ukkonen E. On Line Construction of Suffix Tree. AIgorithmica, 1995,14(3): 249-260
  • 7T Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proceedings of ECML-98, 10th European Conference on Machine Learning, 1998: 137~142.
  • 8Lodhi H, Saunders C, et al. Text Classification Using String Kernels. Journal of Machine Learning Reseaching, 2002,2: 419~444.
  • 9Vapnik V. Statistical Learning Theory. Berlin, Heidelberg,New York, 1998.
  • 10David Haussler. Convolution Kernels on Discrete Structures. Technical report. UCSC-CRL-99-10, 1999.

共引文献16

同被引文献49

引证文献5

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部