期刊文献+

SVD优化初始簇中心的K-means中文文本聚类算法 被引量:10

Optimizing Initial Cluster Centroids by SVD in K-means Algorithm for Chinese Text Clustering
下载PDF
导出
摘要 为了改善传统K-means算法在聚类过程中,聚类数目K难以准确预设,聚类结果受初始中心影响,对噪声点敏感,不稳定等缺点,同时针对文本聚类中文本向量化后数据维数较高,空间分布稀疏,存在潜在语义结构等问题,提出了一种利用奇异值分解(Singular Value Decomposition, SVD)的物理意义进行粗糙分类,再结合K-means算法的中文文本聚类优化算法(SVD-Kmeans)。新算法利用SVD分解的数学意义对文本数据进行了平滑处理,同时利用SVD分解的物理意义对文本数据进行粗糙分类,将分类的结果作为K-means算法的初始聚类中心点。实验结果表明,相比其他K-means及其改进算法,SVD-Kmeans算法的聚类质量F-Measure值有明显提升。 In process of clustering with traditional K-means algorithm, it is difficult to identify the value of the number of clusters K and its clustering results are influenced by initial centers. It has the weakness of sensitivity to noise and instability. Meanwhile, to solve the problems for the high dimensions, sparse spatial and latent semantic structure of the text data, an algorithm for Chinese text clustering was proposed. This new algorithm uses the physical significance of Singular Value Decomposition (SVD) to firstly classify the data rough, and then uses K-means for text clustering. It applies SVD to decompose and keep semantic features, remove noise, make smoothing process of text data, meanwhile, it takes the advantage of physical significance of SVD to have rough set classification, and then regard classification results as initial centers of K-means. Experiment results demonstrate that the F-Measure of cluster quality has been improved compared with other K-means algorithms.
作者 戴月明 王明慧 张明 王艳 Dai Yueming, Wang Minghui, Zhang Ming, Wang Yan(Engineering Research Center of Intemet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi 214122, China)
出处 《系统仿真学报》 CAS CSCD 北大核心 2018年第10期3835-3842,共8页 Journal of System Simulation
基金 国家自然科学基金(61572238) 江苏省杰出青年基金(BK20160001)
关键词 SVD 文本聚类 K-MEANS 初始中心点 SVD text clustering K-means initial center point
  • 相关文献

参考文献12

二级参考文献155

共引文献566

同被引文献101

引证文献10

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部