摘要
提出了一种利用类关联词和K-Means聚类算法实现对文本文档进行分类的方法。类关联词是与类主题相关、能反映类主题的单词或短语。根据文档中包含的类关联词,形成初始聚类中心。在聚类算法过程中,类关联词提供的信息被用来约束待分类文档与聚类中心的相似度比较,加快了算法的执行。实验证明了算法的有效性。
An improved K-Means algorithm is presented to classify text documents using class associated words.Class associated words are words or phrases which represent the subject of classes.The initial clustering centroids are produced with the prior knowledge of class associated words.Class associated words in the documents can be used to supervise clustering and improve the algorithm performance.Experiment results show the algorithm is effective.
出处
《微计算机信息》
2010年第15期4-5,共2页
Control & Automation