期刊文献+

类关联词约束的K-Means半监督文本聚类方法 被引量:2

Semi-supervised K-Means Text Clustering Algorithm Using Class Associated Words
下载PDF
导出
摘要 提出了一种利用类关联词和K-Means聚类算法实现对文本文档进行分类的方法。类关联词是与类主题相关、能反映类主题的单词或短语。根据文档中包含的类关联词,形成初始聚类中心。在聚类算法过程中,类关联词提供的信息被用来约束待分类文档与聚类中心的相似度比较,加快了算法的执行。实验证明了算法的有效性。 An improved K-Means algorithm is presented to classify text documents using class associated words.Class associated words are words or phrases which represent the subject of classes.The initial clustering centroids are produced with the prior knowledge of class associated words.Class associated words in the documents can be used to supervise clustering and improve the algorithm performance.Experiment results show the algorithm is effective.
出处 《微计算机信息》 2010年第15期4-5,共2页 Control & Automation
关键词 文本聚类 文本分类 类关联词 K-MEANS text clustering text classification class associated words K-Means
  • 相关文献

参考文献6

  • 1Macqueen J. Some methods for classification and analysis of multivariate observations[C]. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley:University of California Press, 1967.
  • 2Inderjit S. Dhillon D S M. Concept decompositions for large sparse text data using clustering[J]. Machine Learning. 2001, 42(1): 143-175.
  • 3索红光,王玉伟.一种用于文本聚类的改进k-means算法[J].山东大学学报(理学版),2008,43(1):60-64. 被引量:34
  • 4Paul S. Bradley U M F. Refining initial points for k-means clustering[C]. Proceedings of the 15th International Conference on Machine Learning (ICML98), 1998.
  • 5行小帅,潘进,焦李成.基于免疫规划的K-means聚类算法[J].计算机学报,2003,26(5):605-610. 被引量:81
  • 6杨丽华,戴齐,杨占华.文本分类技术研究[J].微计算机信息,2006(05X):209-211. 被引量:13

二级参考文献18

  • 1张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量:16
  • 2张先飞,李弼程,刘安斐.基于改进KNFL算法的海量文本分类研究[J].微计算机信息,2005,21(11S):159-160. 被引量:4
  • 3刘远超,王晓龙,徐志明,关毅.文档聚类综述[J].中文信息学报,2006,20(3):55-62. 被引量:65
  • 4AH-HWEE TAN.Text Mining:The state of the art and the challenges [C].PAKDD'99 Workshop on Knowledge discovery from Advanced Databases (KDAD'99),Beijing,1999.
  • 5Fabrizio Sebastiani.Machine Learning in Automated Text Categorization[J].ACM Computing Sruveys,2002,34(1):1-47.
  • 6Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization[C].Proceedings of the 14th International Conference on Machine learning.Nashville:Morgan Kanfmann,1997: 412-420.
  • 7Mlademnic,D.,Grobelnik,M.Feature Selection for unbalanced class distribution and Native Bayees [C].Proceedings of the Sisteenth International Conference on Machine Learning.Bled:Morgan Kanfmann, 1999:258-267.
  • 8Belur V D.Nearest Neighbor(NN)Norms:NN pattern Classification Techniques [J].IEEE Computer Society Press,New York:IEEE press, 1991.59.
  • 9Joachims T.Text Categorization with Support Vector Machines:Learning with Many Relevant Features [J].Machine Learning,1998,11398:137-142.
  • 10MACQUEEN J.Some methods for classification and analysis of multivariate observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley:University of California Press,1967:281-297.

共引文献124

同被引文献11

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部