期刊文献+

一种基于MPI的稀疏化局部尺度并行谱聚类算法的研究与实现 被引量:3

A sparse local scaling parallel spectral clustering algorithm based on MPI
下载PDF
导出
摘要 谱聚类算法由于其可识别非凸数据分布、可有效避免局部最优解、不受数据点维数限制等优点,在许多领域得到广泛应用。然而,随着数据量的增大和数据维数的增多,在保证聚类准确性的前提下,尽可能降低计算时间将变得非常必要。此外,影响谱聚类算法聚类质量的因素除数据集本身外,还与所采用的求解距离矩阵的方法、相似性矩阵的尺度参数、Laplacian矩阵形式等多种因素相关。针对以上问题,首先对于大规模数据问题,将MPI并行编程模型应用于谱聚类算法;然后利用t-最近邻方法对谱聚类算法中较大维数的Laplacian矩阵进行近似转化,同时用局部尺度(Local Scaling)参数对算法中的尺度参数进行自动调节。基于上述分析,提出了一种谱聚类并行实现算法,即稀疏化局部尺度并行谱聚类算法SLSPSC,并在四个数据集上进行了测试,与现有的并行谱聚类算法PSC在运行时间和聚类质量两方面做了比较分析。实验结果显示,该算法降低了求解Laplacian矩阵的总时间,同时部分数据集聚类质量得到较大提高。 The spectral clustering algorithm is widely used in many fields because of its advantages of identifying the non-convex data distribution, and effectively avoiding the local optimal solution without the dimension limitation of data points. However, with the growth of the amount and dimension of the data, it is very necessary to reduce the algorithm's computation time on the premise of guaranteeing the clustering accuracy. Moreover, besides the data set itself, the factors affecting the clustering quality of the spectral clustering algorithm include the method of solving distance matrix, the scale parameters of similarity matrix and the form of Laplacian matrix. Aiming at the problems mentioned above, we apply the message passing interface (MPI) parallel programming model to the spectral clustering algorithm. The t-nearest neighbor method is then used in the transformation of Laplacian matrix approximation in the spectral clustering algorithm. Meanwhile, we select the local scaling parameter as the self-tuning scaling parameter in the algorithm. Based on the above analysis, we propose a parallel implementation of the sparse local scaling parallel spectral clustering (SLSPSC), then conduct experiments on four differ- ent data sets, and analyze and compare the results with those of the current parallel spectral clustering (PSC) in running time and clustering quality. Experimental results show that the total computation time of the SLSPSC is greatly reduced when calculating the Laplacian matrix, and the quality of some data sets is improved.
出处 《计算机工程与科学》 CSCD 北大核心 2016年第5期839-847,共9页 Computer Engineering & Science
基金 数学工程与先进计算国家重点实验室开放基金(2014A03)
关键词 并行谱聚类 稀疏化 局部尺度 MPI parallel spectral clustering sparsification local scaling MPI
  • 相关文献

参考文献22

  • 1Qiwei Zhong,Yunlong Lin,Junyang Zou,Kuangyan Zhu,Qiao Wang,Lei Hu.Parallel Spectral Clustering Based on MapReduce[J].ZTE Communications,2013,11(2):45-50. 被引量:3
  • 2Shi J,Malik J. Normalized cuts and image segmentation[J]. IEEE T Pattern Anal,2002,22(8) 888-905.
  • 3Ng A Y, Jordan M I, Weiss Y. On spectral clustering:Anal- ysis and an algorithm[C]// Advances in Neural Information Processing Systems, 2001:849-856.
  • 4Ning Hua-zhong, Xu Wei,Chi Yun,et al. Incermental spec- tral clustering by efficiently uptating the eigen-system[J]. Pattern Reeognition-PR, 2010,43 ( 1 ) : 113-127.
  • 5Yuan Chun-miao, Fan Kai-xiang, Sun Xue-mei. A self-adap- tive spectral clustering based on geodesic distance and shared nearest neighbors[J]. Computer Modelling New Technol- ogiesm 2014,18(13D)22-27.
  • 6Chen Wen-Yen,Song Yang-qiu, Bai Hong-jie, et al. Parallel spectral clustering in distributed systems[J]. IEEE Transac- tions on Pattern Analysis Machine Intelligence, 2011,33 (3) ,568-586.
  • 7Cui Xiao-hui, Charles J St, Potok T E. The GPU enhanced parallel computing for large scale data clustering[C]//2011 International Conference on Cyber-Enabled Distributed Com- puting and Knowledge(CyberC), 2011 : 220-225.
  • 8Zelnik-Manor L,Perona P. Self-tuning spectral elustering[C] //Proe of NIPS,2005:1601-1608.
  • 9Perona P,Freeman W. A factorization approach to grouping [C]//Proc of Computer Vision-ECCV98,1999 : 655-670.
  • 10Von Luxburg U. A tutorial on spectral clustering[J]. Statis- tics and Computing, 2007,17(4)395-416.

二级参考文献10

  • 1U. yon Luxburg, A Tutorial on Spectral Clustering," Statistivs and Computing, vol. 17, pp. 395-416, Aug. 2007.
  • 2Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang, "Parallel Spectral Clustering in Distributed Systems, IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 33, pp. 568-586, Mar. 2011.
  • 3U. yon Luxburg, O. Bousquet, and M. Belkin, "Limits of Spe:tral Clustering," Neural Information Processing Systems bbundation, 2004.
  • 4Yangqiu Song, Wen-Yen Chen, Hongjie Bai, Chih-Jen Lin, and Edward Y. Chang, Parallel Spectral Clustering," Machine Learning and Knowledge Dis- covery in Datatases, vol. 5212, pp. 374-389, 2008.
  • 5J. Dean and S. Ghemawat, =MapReduce: Simplified Data Prucessing on Large Clusters,= Communications of the ACM-50th anniversary issue: 1958-2008, roe 51, pp. 107-113, Jan. 2008.
  • 6Fan R. K. Chung, Spectral Graph Theory (CBM,q Regional Conference SeHe, in Mathematics, No. 92), Providence, RI: Ameriean Mathematical Soeiety, 2007.
  • 7Weizhong Zhao, Huifang Ma, and Qing He, "Parallel K-Means Clustering Based on MapReduee," Cloud Computing: First International Conferenee, Beijing, Chi- na, Dee. 2009, pp. 674 - 679.
  • 8M. E. J. Newman and M. Girvan, "Finding and evaluating community structure in networks," Physk'al Review E: 69, 026113, 2004.
  • 9A. Clauset, M. E. J. Newman, and C. Moore, "Finding community structure in very large networks," Physical Review E. 70, 066111, 2004.
  • 10A. Lancichinetti, S. Fortunato, and F. Radieehi, "Benchmark graphs for testing community detection algorithms". Physical Review E. 78, 046110, 2008.

共引文献2

同被引文献10

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部