期刊文献+

一种基于主题的文本聚类方法 被引量:23

A Topical Document Clustering Method
下载PDF
导出
摘要 现有的文本聚类方法难以正确识别和描述文本的主题,从而难以实现按照主题对文本进行聚类。本文提出了一种新的基于主题的文本聚类方法:LFIC。该方法能够准确识别文本主题并根据文本的主题对其进行聚类。本方法定义和抽取了“主题元素”,并利用其进行基本类索引。同时还整合利用了语言学特征。实验表明,LFIC的聚类准确率达到94.66%,优于几种传统聚类方法。 Few of the existing document clustering methods can detect or describe document topics properly, which makes it difficult to conduct clustering based on topics. In this paper, we introduce a novel topical document clustering method called Linguistic Features Indexing Clustering (LFIC), which can identify topics accurately and cluster documents according to these topics. In LFIC, "topic elements" are defined and extracted for indexing base clusters, Additionally, linguistic features are exploited. Experimental results show that LFIC can gain a higher precision (94. 66 %) than some widely used traditional clustering methods.
出处 《中文信息学报》 CSCD 北大核心 2007年第2期58-62,共5页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60575042 60503072 60675034) 腾讯基金资助项目
关键词 人工智能 模式识别 基于主题文本聚类 基本类索引 语言学特征 artificial intelligence pattern recognition topical document clustering base clusters indexing linguisticfeatures
  • 相关文献

参考文献5

  • 1Hatzivassiloglou V, Gravano L and Maganti A. An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering [A]. In:Proceedings of the 23rd ACM SIGIR Conference, Athens [C]. 2000. 224-231.
  • 2Zamir O and Etzioni O. Web Document Clustering:A Feasibility Demonstration [A]. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. 1998.46-54.
  • 3Gusfield D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology[M]. Cambridge, UK: Cambridge University Press,1997.
  • 4Lee D-L, Chuang H and Seamons K. Document Ranking and the Vector-Space Model [J]. IEEE Software,1997, 14 (2): 67-75.
  • 5Kummamuru K, Lotlikar R, Roy S, et al. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results [A]. In:Proceedings of the 13th International Conference on World Wide Web [C]. 2004. 658-665.

同被引文献245

引证文献23

二级引证文献183

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部