自适应谱聚类算法并行实现及优化

Parallel Implementation and Optimization of Self-Tuning Spectral Clustering Algorithm

导出

摘要谱聚类算法是基于谱图分割理论的聚类方法,其对高维、非凸数据分布问题有很好的聚类效果。但对大规模数据问题的聚类,该方法存在着计算时间和存储空间等方面的瓶颈。本文给出了一个自适应的谱聚类并行算法,通过局部计算和异步循环通信并行方法,最大限度减少了并行谱聚类中数据通信次数,并通过计算与通信重叠策略,进一步降低了并行算法的通信开销。在并行算法实现中,将自主开发的最优预条件共轭梯度法并行求解器PLOBPCG用于谱聚类的特征降维。在中科院的"元"超级计算机上,通过对两类大规模数据聚类的测试表明,在2048核上的加速比接近线性加速,并行效率达到96%以上。 Spectral clustering algorithm is a clustering method based on spectral segmentation theory,which has a good clustering effect on high-dimensional and non-convex data distribution problems.However,there are some bottlenecks in the clustering of large scale data,such as computation time and storage space.In this paper,a parallel self-tuning spectral clustering algorithm is proposed.By means of local calculation and asynchronous loop communication,the number of data communication in parallel spectral clustering is minimized.In order to further reduce the communication overhead,a method of computing and communication overlap is used.In the parallel algorithm implementation,the self-developed locally optimal block preconditioned conjugate gradient method parallel solver PLOBPCG is applied to the feature reduction of spectral clustering.On the supercomputer＂Era＂of Chinese Academy of Sciences,the clustering test of two types of large-scale data shows that the speedup ratio of the 2048 core is nearly linear,and the parallel efficiency is above 96%.

作者苏琳赵永华李瑞琳 Su Lin Zhao Yonghua Li Ruilin(University of Chinese Academy of Sciences, Beijing 100049, China Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China)

机构地区中国科学院大学中国科学院计算机网络信息中心

出处《科研信息化技术与应用》 2016年第6期44-53,共10页 E-science Technology & Application

基金数学工程与先进计算国家重点实验室开放基金(2014A03)

关键词谱聚类并行算法自适应 PLOBPCG 循环通信 spectral clustering parallel algorithm self-tuning PLOBPCG loop communication

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献6

1卜德云,张道强.自适应谱聚类算法研究[J].山东大学学报（工学版）,2009,39(5):22-26. 被引量：16
2李瑞琳,赵永华,黄小磊.一种基于MPI的稀疏化局部尺度并行谱聚类算法的研究与实现[J].计算机工程与科学,2016,38(5):839-847. 被引量：3
3曾玮,赵永华.基于谱分割的稀疏矩阵特征值问题并行求解[J].数值计算与计算机应用,2015,36(2):132-146. 被引量：4
4杨华勇,林晓丽,林立宇.基于格拉斯曼流形上谱聚类的视频人脸识别[J].计算机应用与软件,2014,31(5):168-171. 被引量：4
5吴洋,赵永华,纪国良.一类大规模稀疏矩阵特征问题求解的并行算法[J].数值计算与计算机应用,2013,34(2):136-146. 被引量：5
6周林,平西建,徐森,张涛.基于谱聚类的聚类集成算法[J].自动化学报,2012,38(8):1335-1342. 被引量：62

二级参考文献81

1唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502. 被引量：95
2FiEDLER M. Algebraic connectivity of graphs[J]. Czechoslovak Mathematical Journal, 1973, 23(98) :298-305.
3LUXBURG von U. A tutorial on spectral clustering[J]. Statistics and Computing, 2007, 17(4) :395-416.
4NG A, JORDAN M, WEISS Y. On spectral clustering: analysis and an algorithm[ C ]//Advances in Neural Information Processing Systems (NIPS). Cambridge, MA: MIT Press, 2002.
5SHI J, MALIK J. Normalized cuts and image segmentation[ J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8):888-905.
6KANNAN R, VEMPALA S, VETFA A. On clusterings-good, bad, and spectral[C]//Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science. [ S. l. ]: IEEE Press, 2000.
7HUANG L, YAN D, JORDAN M. Spectral clustering with perturbed data[ C]// Advances in Neural Information Processing Systems (NIPS). Cambridge, MA: MIT Press, 2008: 705- 712.
8CHI Y, SONG X, ZHOU D. Evolutionary spectral clustering by incorporating temporal smoothness[ C]// Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, California: ACM Press, 2007: 153-162.
9ZELNIK MANOR L, PERONA P. Self-tuning spectral clustering[ C]// Advances in Neural Information Processing Systems (NIPS). Cambridge, MA: MIT Press, 2004.
10DHILLON I, GUAN Y, KULIS B. Kernel k-means, spectral clustering and normalized cuts[C]//Proceedings of the 10th International Conference on mining. Seattle, WA, USA: Knowledge Discovery and Data ACM Press, 2004.

共引文献86

1杜淑颖,丁世飞,邵长龙.基于簇间连接的元聚类集成算法[J].南京大学学报（自然科学版）,2023,59(6):961-969. 被引量：2
2刘雅蓉,汪西莉.LUV色彩空间中多层次化结构Nystrm方法的自适应谱聚类算法[J].中国图象图形学报,2012,17(4):530-536. 被引量：5
3刘娜,肖智博,鲁明羽.基于形态学的单词-文档谱聚类方法[J].南京大学学报（自然科学版）,2012,48(2):154-163.
4陈姿羽,黄靖,李伟鹏.一种改进的自适应谱聚类图像分割算法[J].南方医科大学学报,2012,32(5):655-658. 被引量：4
5吴爽,雷秀娟,郭玲.PPI网络的改进谱聚类算法[J].计算机应用研究,2012,29(7):2442-2446.
6贾礼平,张之鹤,汪天飞,胡水清.基于聚类方法研究“近朱者赤近墨者黑”问题[J].内江科技,2012,33(11):73-74.
7李新叶,余晓晔.适用于复杂结构的多路谱聚类算法的改进[J].北京工业大学学报,2013,39(3):425-429. 被引量：1
8郭凯,李海芳,王会青.一种人工免疫的自适应谱聚类算法[J].小型微型计算机系统,2013,34(4):856-859. 被引量：6
9周佳男,冯志林,朱向军.喷墨印花纹理图像的期望最大化聚类分割算法[J].机电工程,2013,30(8):1029-1032.
10袁可红,黄士国,王德运,郭海湘.一种基于谱聚类分析的粒子群聚类算法[J].数学杂志,2013,33(5):935-940. 被引量：2

1胡武扬,段富海,董科锐.基于LabVIEW的舵机自动加载测试系统软件设计[J].软件,2015,36(5):24-29 33. 被引量：12
2汤龙.浅谈基于图论的遥感图像分割算法研究[J].信息系统工程,2016,29(9):141-141. 被引量：1
3李玉红,张洋,寇兴权,赵洋,姜颖.教学资源访问系统的并行调度算法设计[J].沈阳化工学院学报,2007,21(2):143-145. 被引量：1
4石荣刚,李志远,江涛.图像分割的常用方法及其应用[J].现代电子技术,2007,30(12):111-114. 被引量：11
5张硕,何发智,周毅,鄢小虎.基于自适应线程束的GPU并行粒子群优化算法[J].计算机应用,2016,36(12):3274-3279. 被引量：2
6张发存,赵晓红,王忠,沈绪榜.区域生长法图像分割的数据并行方法研究[J].计算机工程,2004,30(17):14-16. 被引量：7
7邓婕,张兴浦,陈世友.基于GPU的信息融合并行方法研究[J].舰船电子工程,2016,36(6):35-37. 被引量：1
8王虹,时文.基于SOFM的聚类分析在数据挖掘中的应用研究[J].交通与计算机,2005,23(3):44-46. 被引量：1
9杨万挺,汪荣贵,方帅,张璇.滤波器可变的Retinex雾天图像增强算法[J].计算机辅助设计与图形学学报,2010,22(6):965-971. 被引量：43
10区峻,石千惠.特征降维和高斯混合模型的体育运动动作识别[J].现代电子技术,2017,40(11):61-64. 被引量：7

科研信息化技术与应用

2016年第6期

浏览历史

内容加载中请稍等...

自适应谱聚类算法并行实现及优化

参考文献6

二级参考文献81

共引文献86

相关作者

相关机构

相关主题

浏览历史