摘要
针对现有子空间聚类方法处理类簇间存在重叠时聚类准确率较低的问题,文中提出基于概率模型的重叠子空间聚类算法.首先采用混合范数的子空间表示方法将高维数据分割为若干个子空间.然后使用服从指数族分布的概率模型判断子空间内数据的重叠部分,并将数据分配到正确的子空间内,进而得到聚类结果,在参数估计时利用交替最大化方法确定函数最优解.在人造数据集和UCI数据集上的测试实验表明,文中算法具有良好的聚类性能,适用于较大规模的数据集.
Due to the low clustering accuracy of the existing subspace clustering methods in dealing with the problem of overlapping clusters, an overlapping subspace clustering algorithm based on probability model (OSCPM) is proposed. Firstly, the high-dimensional data is divided into several subspaces by using the subspace representation of mixed-norm. Then, a probability model of the exponential family distribution is used to determine the overlapping part of the clusters in the subspace, and the data is assigned to the correct class clusters to get the clustering results. An alternating maximization method is used to determine the optimal solution of the objective function in the process of parameter estimation. Experimental results on artificial datasets and UCI datasets show that OSCPM produces better clustering performance compared with other algorithms and it is suitable for large scale datasets.
作者
邱云飞
费博雯
刘大千
QIU Yunfei FEI Bowen LIU Daqian(School of Software, Liaoning Technical University, Huludao 125105 School of Business Administration, Liaoning Technical University, Huludao 125105 School of Electronic and Information Engineering, Liaoning Technical University, Huludao 125105)
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2017年第7期609-621,共13页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金青年科学基金项目(No.61401185)资助~~
关键词
重叠子空间聚类
混合范数
子空间表示
概率模型
交替最大化
Overlapping Subspace Clustering, Mixed-Norm, Subspace Representation, ProbabilisticModel, Alternating Maximization