摘要
提出基于随机初始化、参数扰动和特征子集映射的多扰动的局部自适应软子空间聚类(LAC)融合算法(MLACE)。MLACE具有以下特点:(i)多扰动融合:从初始化、参数和特征子集等不同侧面,探测数据内部结构,使之相互融合,从而达到改善聚类正确性的目的;(ii)融合信息提升:根据LAC算法输出的子空间权重矩阵,定义数据属于每一类的概率,形成提升的融合信息;(iii)融合一致性函数改进:融合信息的形式由0/1二值信息转换成[0,1]实值信息,因此,一致性函数采用了性能较优的实数值融合算法Fast global K-means来进一步改善融合正确性。实验选取2个仿真数据库和5个UCI数据库测试MLACE的聚类正确性,实验结果表明,MLACE聚类正确性优于K-means、LAC、基于参数扰动LAC融合算法(P-MLACE)。
This paper proposed multiple local adaptive soft subspace clustering (LAC) ensemble (MLACE) based on multimodal perturbation. There are three merits in the proposed MLACE. Firstly, MLACE combines diversity and com- plement decisions generated by random initialization, parameter perturbation and feature subspace projection, so as to improve the accuracy of clustering. Secondly, the clustering ensemble information is refined. The probability of each in- stance belonging to all clusters is defined according to the subspace weight matrix from LAC. Thirdly, because the clus- tering ensemble information is refined from 0/1 binary value into [0,1]real value, the consensus function in clustering ensemble can adopt real valued clustering ensemble method Fast global K means, which can further improve the accura- cy of clustering ensemble. Two synthetic datasets and five UCI datasets were chosen to evaluate the accuracy of MLACE. The experiment results show that MLACE is more accurate than K-means, LAC, Multiple LAC clustering en- semble based on parameter perturbation (P-MLACE).
出处
《计算机科学》
CSCD
北大核心
2014年第2期240-244,共5页
Computer Science
基金
国家自然科学基金(61070033
61100148
61202269)
广东省自然科学基金(S2011 040004804)
广东省科技计划项目(2010B050400011)
软件新技术国家重点实验室开放课题(KFKT2011B19)
广东高校优秀青年创新人才培育项目(LYM11060)
广州市科技计划项目(12C42111607
201200000031)
番禺区科技计划项目(2012-Z-03-67)资助
关键词
聚类融合
软子空间聚类
局部自适应软子空间聚类
多扰动
Clustering ensemble, Soft subspace clustering, Local adaptive soft subspace clustering, Multimoldal perturbation