期刊文献+

基于关联语义链网络的文本聚类方法 被引量:2

Document Clustering Method Based on Association Link Network
下载PDF
导出
摘要 基于关联语义链网络提出了一种自适应分裂的文本聚类方法.该方法通过从关联语义链网络中检测出各个社团结构作为文本集中的类别,以避免对聚类数目的预先确定.同时,针对高维稀疏的词向量导致的文本之间或文本与类之间相似性低的问题,将关联语义链网络中词与词之间的关联关系映射到文本与类之间的关联关系中去,以增强文本与类之间关系的强度.通过与其他主要聚类方法进行实验对比,发现该聚类方法不仅能够对文本集合进行准确的聚类,而且能够较准确地确定聚类中心数目和识别出文本集中的话题信息. This paper proposes a document clustering method with adaptive divisions based on association link network.Instead of explicitly offering the number of cluster centers in the traditional document clustering algorithms,categories were acquired automatically by detecting the community structure in association link network.Simultaneously,with the consideration of the high-dimension and sparse word vectors that result in low similarities between the documents,the relationships were mapped between words in association link network to the relationships between the documents.Through the experimental comparisons with other clustering methods,it was found that the proposed clustering method not only obtains a high aggregation accuracy,but also are good at adaptively discovering the number of cluster centers and distinguishing categories of topics.
作者 何祥 骆祥峰
出处 《上海大学学报(自然科学版)》 CAS CSCD 北大核心 2014年第2期190-198,共9页 Journal of Shanghai University:Natural Science Edition
基金 国家自然科学基金资助项目(61071110)
关键词 文本聚类 关联语义链网络 社区检测 document clustering association link network community detection
  • 相关文献

参考文献18

二级参考文献41

  • 1李春华,朱燕飞,毛宗源.一种新型的自适应人工免疫算法[J].计算机工程与应用,2004,40(22):84-87. 被引量:11
  • 2姜亚莉,关泽群.用于Web文档聚类的基于相似度的软聚类算法[J].计算机工程,2006,32(2):59-61. 被引量:6
  • 3李永森,杨善林,马溪骏,胡笑旋,陈增明.空间聚类算法中的K值优化问题研究[J].系统仿真学报,2006,18(3):573-576. 被引量:39
  • 4钱线,黄萱菁,吴立德.初始化K-means的谱方法[J].自动化学报,2007,33(4):342-346. 被引量:32
  • 5Han J, Kamber M. Data Mining Concepts and Techniques. Orlando, USA: Morgan Kaufmann Publishers, 2001
  • 6Huang J Z, Ng M K, Rang Hongqiang, et al. Automated Variable Weighting in K-means Type Clustering. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27 (5) : 657 - 668
  • 7Dhillon I S, Guan Yuqiang, Kogan J. Refining Clusters in High Dimensional Text Data//Proc of the 2nd SIAM Workshop on Clustering High Dimensional Data. Arlington, USA, 2002 : 59 - 66
  • 8Zhang B. Generalized K-Harmonic Means: Dynamic Weighting of Data in Unsupervised Learning//Proc of the 1 st SIAM International Conference on Data Mining. Chicago, USA, 2001 : 1 - 13
  • 9Sarafis I, Zalzala A M S, Trinder P W. A Genetic Rule-Based Data Clustering Toolkit//Proc of the Congress on Evolutionary Computation. Honolulu, USA, 2002 : 1238 - 1243
  • 10Ma J, Perkins S. Time-Series Novelty Detection Using One-Class Support Vector Machines// Proc of the International Joint Conference on Neural Networks. Portland, USA, 2003, Ⅲ: 1741 - 1745

共引文献172

同被引文献23

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部