摘要
现有基于低秩表示的子空间聚类算法(LRR)无法有效地处理大规模数据,聚类正确率不高,以及分布式低秩子空间聚类算法(DFC-LRR)不能直接处理高维数据.为此,文中提出了一种基于张量和分布式方法的子空间聚类算法.该算法首先将高维数据视为张量,在数据的自表示中引入张量乘法,从而将LRR子空间聚类算法拓展到高维数据;然后采用分布式并行计算得到低秩表示的系数张量,并对系数张量的每个侧面切片稀疏化,得到稀疏相似度矩阵.在公开数据集Extended YaleB、COIL20和UCSD上与DFC-LRR的对比实验结果表明,文中算法能有效地提高聚类正确率,且分布式计算能明显降低算法的运行时间.
Subspace clustering algorithm based on low-rank representation(LRR)cannot handle large-scale data effectively,and distributed low-rank subspace clustering algorithm(DFC-LRR)cannot handle the high-dimensional data directly.To solve this issue,a distributed low-rank subspace clustering algorithm based on tensor and distributed computing was proposed.The proposed method firstly considered high-dimensional data as tensor and extended LRR subspace clustering algorithm to high-dimensional data by introducing tensorial multiplication into self representation of data.Then the low-rank coefficient tensor was obtained through the distributed parallel computing,and get the sparse similarity matrix by sparing every lateral slices of the coefficient tensor.Experimental results on the Extended Yale B,COIL20 and UCSD datasets show that the proposed algorithm outperforms DFC-LRR in clustering accuracy,and distributed computing can reduce the running time obviously.
作者
刘小兰
潘凎
易淼
李植鹏
LIU Xiaolan;PAN Gan;YI Miao;LI Zhipeng(School of Mathematics,South China University of Technology,Guangzhou 510640,Guangdong,China;State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,Jiangsu,China;College of Physical Science and Technology,Yichun University,Yichun 336000,Jiangxi,China;School of Computer Science and Engineering,South China University of Technology,Guangzhou 510006,Guangdong,China)
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2019年第8期77-83,95,共8页
Journal of South China University of Technology(Natural Science Edition)
基金
国家自然科学基金资助项目(61502175,61273295)
广东省自然科学基金资助项目(2016A030313545)
广州市科技计划项目(201607010069)~~
关键词
低秩表示
子空间聚类
分布式计算
张量
low-rank representation
subspace clustering
distributed computing
tensor