摘要
针对聚类分析常面临的维数灾难和噪声污染问题,将样本加权思想与子空间聚类算法相结合,提出了一种鲁棒的子空间聚类算法.该算法结合现有子空间聚类方法,为每个类簇计算一个反映各维度聚类贡献程度的权矢量,并利用该权矢量对各维度加权组合,得到各类簇所处的子空间.此外,算法还为每个样本分配一个反映离群程度的尺度参数,以区分正常样本和离群点在聚类过程中的地位,保证算法的鲁棒性.在二维数据集、高维数据集以及基因数据集上的对比实验结果表明,对于具有不同噪声比例的各种维度数据集,该算法均能取得较高的聚类精度,表现出较好的鲁棒性.
A new algorithm is presented to simultaneously solve the problems that clustering suffers from the curse of dimensionality as well as noise contamination. Following some existing idea, the algorithm associates a weight vector to each cluster in the entire data space, and captures the contribution degrees of dimensions for identifying the cluster. Different subspaces for discovering clusters are obtained by combining dimensions via those weight vectors. Furthermore, the algorithm assigns a scalar value to each sample to discriminate the role of outliers from that of normal samples during the clustering process; therefore, the robustness of the algorithm is guaranteed. Experimental results show that the proposed algorithm gains high clustering accuracy on datasets of different dimensions with various noise ratios added.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2011年第6期13-19,共7页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(61070137
60933009)
陕西省科技攻关资助项目(2009K1-56)
关键词
子空间聚类
鲁棒性
权参数
最优化
subspace clustering
robustness
weight
optimization