摘要
随着大数据时代的来临,如何对海量高维数据进行有效的聚类分析并充分利用,已成为当下的热门研究课题。传统的聚类算法在处理高维数据时,聚类结果的精确度和稳定性较低,而子空间聚类算法通过分割原始数据的特征空间来得到不同的特征子集,可以大幅减小数据之间不相关特征对聚类结果的影响,挖掘出高维数据中不易展现的信息,在处理高维数据时具有显著的优势。针对现有基于图的子空间聚类算法在处理未知类型噪声以及复杂的凸问题时存在局限性的问题,在子空间聚类算法的基础上,结合空间投影理论,提出了一种基于投影的鲁棒低秩子空间聚类算法。首先对原始数据进行投影,利用编码消除投影空间的噪声,并对缺失的数据进行弥补;然后利用一种新的方法l 2图来构造稀疏相似图;最后在l 2图的基础上进行子空间聚类。该算法不需要对噪声的类型具有先验知识,且l 2图能够很好地描述高维数据稀疏性和空间分散的特征。选取3种人脸数据集作为实验数据集,首先确定影响聚类效果的最优参数,然后从准确度、鲁棒性、时间复杂度3个方面对算法进行验证。实验结果表明,在3种人脸数据集中混入未知类型的噪声时,该算法具有较高的准确率和较低的时间复杂度,并且具有好的鲁棒性。
With the advent of the era of big data,how to effectively cluster,analyze and effectively use massive amounts of high-dimensional data has become a hot research topic.When the traditional clustering algorithms are used to process high-dimensional data,the accuracy and stability of the clustering results are low.The subspace clustering algorithm can reduce the feature space of the original data to form different feature subsets,reduce the influence of uncorrelated features between data on clustering results.It can mine the information that is difficult to display in high-dimensional data,and has significant advantages in processing high-dimensional data.Aiming at the limitations of existing graph-based subspace clustering algorithms in dealing with unknown type noise and solving complex convex problems,based on subspace clustering algorithm,combined with spatial projection theory,this paper proposes a projection-based robust low-rank subspace clustering algorithm.Firstly,the original data is projected,the noise of the projection space is eliminated by coding and the missing data is compensated.Then a new method map is used to construct the sparse similarity l 2 graph,and finally the subspace clustering is performed on the basis of the l 2 graph.The algorithm does not need a priori knowledge of the type of noise,and the l 2 graph can well describe the characteristics of high-dimensional data sparsity and spatial dispersion.Three datasets of face recognition are selected as experimental datasets.Firstly,the optimal parameters affecting the clustering effect are determined,and then the algorithm is verified from three aspects:accuracy,robustness and time complexity.The experimental results show that the algorithm has high accuracy,low time complexity and good robustness,when the unknown type of noise is mixed in the datasets of face recognition.
作者
邢毓华
李明星
XING Yu-hua;LI Ming-xing(College of Automation and Information Engineering,Xi’an University of Technology,Xi’an 710048,China)
出处
《计算机科学》
CSCD
北大核心
2020年第6期92-97,共6页
Computer Science
基金
国家自然科学基金(51307140)。
关键词
高维数据
噪声
子空间聚类
空间投影
l
2图
High dimensional data
Noise
Subspace clustering
Space projection
l 2 graph