摘要
针对现有聚类集成谱算法聚类结果不稳定的问题,引入近邻传播聚类思想,设计了基于近邻传播的聚类集成谱算法(APCESA).该算法先由聚类集成和谱分得到空间结构相对简单的文本低维嵌入,然后通过近邻传播算法得到最终的聚类结果.在谱分解过程中,采用矩阵变换方法,避免了谱算法中特征值分解的高昂计算代价.对真实文本数据集的实验结果表明,所提算法比对比算法聚类更稳定,且聚类结果的NMI值和ANMI值均高于对比算法.
The existing cluster ensemble spectral algorithm are mostly unstable. To solve this problem, an affinity propagation-based cluster ensemble spectral algorithm was proposed, which brings in the idea of affinity propagation clustering. The algorithm utilized cluster ensemble and spectral analysis to achieve the low dimensional embedding of documents, and obtained the final clustering results by using an affinity propagation clustering algorithm. To avoid the high computational cost of eigenvalue decomposition in a spectral algorithm, matrix transformation was used in this paper. Experiments using real-world document sets show that the proposed algorithm is more stable than the compared methods, both NMI and ANMI values of the clustering result are higher than that of the comparison method.
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2012年第7期899-905,共7页
Journal of Harbin Engineering University
基金
国家自然科学基金资助项目(60975042)
关键词
近邻传播
聚类集成
文本聚类
谱聚类
矩阵变换
affinity propagation algorithm
cluster ensemble
document clustering
spectral clustering
matrix trans-formation