期刊文献+

基于词性和中心点改进的文本聚类方法 被引量:6

A Text Clustering Method Based on Speech to Text and Improved Center Selection
原文传递
导出
摘要 针对k-均值算法对初始点敏感、易陷入局部最优的问题,提出一种基于词性和中心点改进的文本聚类方法(STICS).通过改进文本的语义型表示,优化中心点的选取,并消除孤立点的负面影响,从而获得较好的聚类效果.STICS考虑不同词性特征对文本的贡献,采用加权的向量空间模型来表示文本.对于中心点的选取,首先度量每个样本的样本平均相似度,其次选取样本平均相似度最大的样本作为第一个聚类中心.此外,STICS消除孤立点的负面影响,以此提高聚类效果.实验结果表明文中方法确实具有更好的聚类效果. The traditional k-means algorithm is sensitive to the initial point and easy to fall into local optimum. An improved speech to text and improved center selection (STICS) based text clustering method is proposed. Taking into account the speech to text, the optimal selection of centers and treatment of outliers concurrently, STICS has three aspects of improvement. The weighted vector space model (VSM) is used to represent text according to the speech to text. For the selection of the center, the sample average similarity is measured for each sample, and the sample with the largest sample average similarity is selected as the first center. In addition, STICS method eliminates the negative influences of isolated points, or outliers. Both theoretical analysis and experimental results prove that the proposed algorithm has better clustering results.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2012年第6期996-1001,共6页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金资助项目(No.60970107)
关键词 文本聚类 K-均值 词性特征 样本平均相似度 孤立点 Text Clustering, k-means, Speech to Text, Sample Average Similarity, Outlier
  • 相关文献

参考文献12

  • 1刘远超,王晓龙,徐志明,关毅.文档聚类综述[J].中文信息学报,2006,20(3):55-62. 被引量:65
  • 2MacQueen J. Some Methods for Classification and Analysis of Muhi- variate Observations // Proc of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967, I : 281-297.
  • 3陈浩,何婷婷,姬东鸿.基于k-means聚类的无导词义消歧[J].中文信息学报,2005,19(4):10-16. 被引量:16
  • 4Shameem M U S, Ferdous R. An Efficient k-means Algorithm Inte- grated with Jaccard Distance Measure for Document Clustering // Proc of the 1st Asian Himalayas International Conference on Inter- net. Kathmandu, Nepal, 2009:1-6.
  • 5Qing Xiaoping, Zheng Shijue. A New Method for Initializing the K-means Clustering Algorithm//Proc of the 2nd International Sym- posium on Knowledge Acquisition and Modeling. Wuhan, China, 2009 : 41-44.
  • 6Chen Xuhui, Xu Yong, K-means Clustering Algorithm with Refined Initial Center// Proc of the 2nd International Conference on Bio- medical Engineering and Informatics. Tianjin, China, 2009:1-4.
  • 7许厚金,刘永炎,邓成玉,刘永山.基于相似中心的k-cmeans文本聚类算法[J].计算机工程与设计,2010,31(8):1802-1805. 被引量:12
  • 8Sahon G, Wong A, Yang CS. A Vector Space Model for Informa- tion Retrieval. Communications of the ACM, 1975, 18(11 ) : 613- 620.
  • 9Sahon G, Buckley B. Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management, 1988, 24 (5) : 513-523.
  • 10赵世奇,刘挺,李生.一种基于主题的文本聚类方法[J].中文信息学报,2007,21(2):58-62. 被引量:23

二级参考文献62

  • 1李孝明,曹万华.文本信息检索的精确匹配模型[J].计算机科学,2004,31(9):100-102. 被引量:7
  • 2黄昌宁,李涓子.词义排歧的一种语言模型[J].语言文字应用,2000(3):85-90. 被引量:16
  • 3陈浩,何婷婷,姬东鸿.基于k-means聚类的无导词义消歧[J].中文信息学报,2005,19(4):10-16. 被引量:16
  • 4K.haled M Hammouda,Mohamed S Kamel.Efficient phrase-based document indexing for web document clustering[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(10):1279- 1296.
  • 5Joshua Zhexue Huang, Michael K Ng, Hongqiang Rong, et al. Automated variable weighting in k-means type clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):657-668.
  • 6Shehroz S Khan,Amir Ahmad.A cluster center initialization algorithm for k-means clustering[J].Pattem Recognition Letters, 2004,25(11):1293-1302.
  • 7Ramiz M Aliguliyev.Clustering of document collection- a weighting approach [J]. Expert Systems with Applications, 2009,36(4) :7904-7916.
  • 8Tapas Kanungo,David M Mount,Nathan S Net-anyahu,et al.An efficient k-means clustering algorithm [J]. Analysis and Implementation,IEEE Transactions on Pattern Analysis and Machine InteUigence,2002,24(7):881-892.
  • 9Ajith Abraham, Swagatam Das, Amit Konar. Document clustering using differential evolution[C].Vancouver, BC:IEEE Congress on Evolutionary Computation,2006:1784-1791.
  • 10Richard Nock, Frank Nielsen.On weighting clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006,28(8): 1223-1235.

共引文献108

同被引文献42

引证文献6

二级引证文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部