摘要
利用模糊C均值算法解决文本聚类问题时,随机选取的初始聚类中心和聚类数会导致不同的聚类结果,且容易陷入局部最优。提出利用粒子群优化算法确定模糊C均值的初始聚类中心,并通过向量空间模型和特征提取,再利用模糊C均值进行文档聚类。实验表明,这种基于粒子群的模糊C均值聚类算法迭代次数少,能解决经典模糊C均值算法对初始值敏感和易陷入局部极小的缺点,且聚类速度和效果得到明显提高。
The classical fuzzy c-means clustering algorithm, which is used to clustering Chinese text, is sensitive to the initial clustering center and the clustering number, it also has the limitation of converging to the local infinitesimal point. In this paper, a fuzzy cmeans clustering algorithm based on particle swarm optimization is proposed to cluster Chinese text, the particle swarm optimization helps determining the initial clustering center, furthermore using the vector space model and features extraction preprocessed, then a fuzzy c-means clustering is used for text clustering. The experimental results show that this algorithm avoids the limitation of fuzzy c- means and is obviously superior to the classical fuzzy c-means in accuracy ratio and clustering performances.
出处
《图书情报工作》
CSSCI
北大核心
2010年第6期57-60,65,共5页
Library and Information Service
关键词
模糊C均值
粒子群
文本聚类
fuzzy c-means particle swarm text clustering