求解大规模谱聚类的近似加权核k-means算法被引量：30

Approximate Weighted Kernel k-means for Large-Scale Spectral Clustering

下载PDF

导出

摘要谱聚类将聚类问题转化成图划分问题,是一种基于代数图论的聚类方法.在求解图划分目标函数时,一般利用Rayleigh熵的性质,通过计算Laplacian矩阵的特征向量将原始数据点映射到一个低维的特征空间中,再进行聚类.然而在谱聚类过程中,存储相似矩阵的空间复杂度是O(n2),对Laplacian矩阵特征分解的时间复杂度一般为O(n3),这样的复杂度在处理大规模数据时是无法接受的.理论证明,Normalized Cut图聚类与加权核k-means都等价于矩阵迹的最大化问题.因此,可以用加权核k-means算法来优化Normalized Cut的目标函数,这就避免了对Laplacian矩阵特征分解.不过,加权核k-means算法需要计算核矩阵,其空间复杂度依然是O(n2).为了应对这一挑战,提出近似加权核k-means算法,仅使用核矩阵的一部分来求解大数据的谱聚类问题.理论分析和实验对比表明,近似加权核k-means的聚类表现与加权核k-means算法是相似的,但是极大地减小了时间和空间复杂性. Spectral clustering is based on algebraic graph theory. It turns the clustering problem into the graph partitioning problem. To solve the graph cut objective function, the properties of the Rayleigh quotient are usually utilized to map the original data points into a lower dimensional eigen-space by calculating the eigenvectors of Laplacian matrix and then conducting the clustering in the new space. However, during the process of spectral clustering, the space complexity of storing similarity matrix is O（n^2）, and the time complexity of the eigen-decomposition of Laplacian matrix is usually O（n^3）. Such complexity is unacceptable when dealing with large-scale data sets. It can be proved that both normalized cut graph clustering and weighted kernel k-means are equivalent to the matrix trace maximization problem, which suggests that weighted kernel k-means algorithm can be used to optimize the objective function of normalized cut without the eigen-decomposition of Laplacian matrix. Nonetheless, weighted kernel k-means algorithm needs to calculate the kernel matrix, and its space complexity is still O（n^2）. To address this challenge, this study proposes an approximate weighted kernel k-means algorithm in which only part of the kernel matrix is used to solve big data spectral clustering problem. Theoretical analysis and experimental comparison show that approximate weighted kernel k-means has similar clustering performance with weighted kernel k-means algorithm, but its time and space complexity is greatly reduced.

作者贾洪杰丁世飞史忠植

机构地区中国矿业大学计算机科学与技术学院中国科学院计算技术研究所智能信息处理重点实验室

出处《软件学报》 EI CSCD 北大核心 2015年第11期2836-2846,共11页 Journal of Software

基金国家重点基础研究发展计划(973)(2013CB329502) 国家自然科学基金(61379101) 江苏省普通高校研究生科研创新计划(KYLX15_1442)

关键词谱聚类迹最大化加权核k-means 近似核矩阵大数据 spectral clustering trace maximization weighted kernel k-means approximate kernel matrix big data

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献23

1Sun JG, Liu J, Zhao LY. Clustering algorithms research. Ruan Jian Xue Bao/Joumal of Software, 2008,19(1): 48-61 (in Chinese with English abstract), http://www.jos.org.cn/1000-9825/19/48.htm [doi: 10.3724/SP.J.1001.2008.00048].
2Schleif FM, Zhu XB, Gisbrecht A, Hammer B. Fast approximated relational and kernel clustering. In: Proc. of the 21st Int’l Conf. on Pattern Recognition. 2012. 1229-1232.
3Jia HJ, Ding SF, Xu XZ, Nie R. The latest research progress on spectral clustering. Neural Computing and Applications, 2014, 24(7-8): 1477-1486. [doi: 10.1007/s00521 -013-1439-2].
4Chan PK, Schlag MDF, Zien JY. Spectral fc-way ratio-cut partitioning and clustering. IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems, 1994,13(9):1088-1096. [doi: 10.1109/43.310898].
5Shi J, Malik J. Normalized cuts and image segmentation. IEEE Trans, on Pattern Analysis and Machine Intelligence, 2000,22(8): 888-905. [doi: 10.1109/34.868688].
6Rebagliati N, Verri A. Spectral clustering with more than k eigenvectors. Neurocomputing, 2011,74(9):1391-1401. [doi: 10.1016/j. neucom.2010.12.008].
7Von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007,17(4):395-416. [doi: 10.1007/sl 1222-007-9033 -z].
8Fowlkes C, Belongie S, Chung F, Malik J. Spectral grouping using the NystrOm method. IEEE Trans, on Pattern Analysis and Machine Intelligence, 2004,26(2):214-225. [doi: 10.1109/TPAMI.2004.1262185].
9Kumar S, Mohri M, Talwalkar A. Sampling methods for the Nystrom method. Journal of Machine Learning Research, 2012,13(1): 981-1006.
10Si S, Hsieh CJ, Dhillon I. Memory efficient kernel approximation. In: Proc. of the 31st Int’l Conf. on Machine Learning. 2014. 701-709.

同被引文献244

1王振文.高速铁路弓网系统动态性能评估[J].铁路技术创新,2011(1):70-71. 被引量：1
2韦素云,肖静静,业宁.基于联合聚类平滑的协同过滤算法[J].计算机研究与发展,2013,50(S2):163-169. 被引量：12
3梁学修,陈志,张小超,伟利国,李伟,车宇.联合收获机喂入量在线监测系统设计与试验[J].农业机械学报,2013,44(S2):1-6. 被引量：26
4黄仁,冯阿瑞.基于Ncut的自适应图像分割方法[J].土木建筑与环境工程,2013,35(S2):107-110. 被引量：2
5叶志伟,尹宇洁,王明威,赵伟.一种基于杜鹃搜索算法的聚类分析方法[J].微电子学与计算机,2015,32(5):104-110. 被引量：6
6陈兴蜀,吴小松,王文贤,王海舟.基于特征关联度的K-means初始聚类中心优化算法[J].四川大学学报（工程科学版）,2015,47(1):13-19. 被引量：29
7王长君,高岩,张爱红.重点违法行为导致交通事故的数据分析[J].交通运输工程与信息学报,2005,3(3):29-36. 被引量：13
8臧大进,严宏凤,王跃才.多传感器信息融合技术综述[J].工矿自动化,2005,31(6):30-32. 被引量：20
9杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量：187
10刘建书,李人厚,刘云龙,张贞耀.基于相关性函数和模糊综合函数的多传感器数据融合[J].系统工程与电子技术,2006,28(7):1006-1009. 被引量：22

引证文献30

1王炳琪,聂潇乾,严鹏,吴彬彬,高承帅.多站点低空防御系统关键技术研究[J].制导与引信,2019,0(4):17-22.
2程龙欢,李舜酩.多源振动信号融合方法综述[J].计算机应用研究,2020,37(S02):12-14. 被引量：1
3张小博,王婷,佟芳,徐铁军,李晖,秦浩.基于谱聚类的用电采集网络拓扑推断算法[J].电力信息与通信技术,2018,16(12):39-45. 被引量：3
4周旺,张晨麟,吴建鑫.一种基于Hartigan-Wong和Lloyd的定性平衡聚类算法[J].山东大学学报（工学版）,2016,46(5):37-44. 被引量：4
5赵月爱,武建.基于信息熵属性赋权的谱聚类算法研究[J].太原师范学院学报（自然科学版）,2017,16(1):46-52.
6濮君强.基于聚类分析技术的新能源汽车数据挖掘分析[J].自动化与仪器仪表,2018,0(3):173-176. 被引量：3
7王艺霏,彭柏.基于数据双重优化聚类的协同过滤推荐算法[J].信息技术,2018,42(6):115-120. 被引量：1
8陶莹,杨锋,刘洋,戴兵.K均值聚类算法的研究与优化[J].计算机技术与发展,2018,28(6):90-92. 被引量：57
9李贤,许大卫.基于聚类中心度的网络数据划分研究[J].自动化技术与应用,2018,37(9):86-90. 被引量：3
10邹臣嵩,刘松.基于谱聚类的全局中心快速更新聚类算法[J].计算机与现代化,2018(10):6-11. 被引量：3

二级引证文献229

1吴运明,王令村,魏子栋,郭顺利.基于Canopy-Kmeans的移动商务用户需求聚合挖掘及分析研究[J].情报科学,2022,40(10):97-106. 被引量：1
2林耿堃,盛积良.乡村振兴时代背景下农民消费结构变迁研究[J].农业农村部管理干部学院学报,2021(2):76-81. 被引量：3
3刘航,李锡祚.基于深度学习的协同过滤推荐算法[J].智能计算机与应用,2020(8):100-104. 被引量：2
4成雨风,贺松,刘燕,黄诗懿.基于数据挖掘的CRC肠道菌群营养干预可行性分析[J].智能计算机与应用,2020(4):81-85.
5滑江,孙钰,周彦斌,蔡曙日,龚尚文.基于K-means方法的气象数据分区在公路养护的应用[J].公路交通科技,2022,39(S01):19-23.
6肖堃.工业以太网中多次变异信息入侵检测仿真[J].计算机仿真,2018,35(12):406-410.
7林伟宁,陈明志,詹云清,刘川葆.一种基于PCA和随机森林分类的入侵检测算法研究[J].信息网络安全,2017(11):50-54. 被引量：20
8黄熙岱.关于多组件网络节点漏洞准确识别仿真[J].计算机仿真,2018,35(7):323-326. 被引量：1
9王毅,谢瑞煜,杨利斌,赵建军.多无人机协同任务分群方案研究[J].舰船电子工程,2018,38(10):18-22. 被引量：1
10吴倩,王民慧.基于混合颜色空间K均值聚类的白斑面积测量方法[J].新型工业化,2018,8(7):88-93.

1唐朝辉,朱清新,洪朝群,祝峰.基于自编码器及超图学习的多标签特征提取[J].自动化学报,2016,42(7):1014-1021. 被引量：13
2施培蓓,郭玉堂,胡玉娟,俞骏.多尺度的谱聚类算法[J].计算机工程与应用,2011,47(8):128-130. 被引量：4
3梁栋,童强,王年,鲍文霞,屈磊.一种基于Laplacian矩阵的图像匹配算法[J].计算机工程与应用,2005,41(36):31-32. 被引量：4
4郑朝晖.基于NCC匹配的Camshift目标跟踪算法[J].四川理工学院学报（自然科学版）,2015,28(4):37-40. 被引量：5
5胡从兴,陈林,丁晖,曾奕.结合语句执行补集的程序错误定位[J].计算机科学与探索,2011,5(6):522-533. 被引量：1
6张智,傅忠谦,严钢.Synchronization speed of identical oscillators on community networks[J].Chinese Physics B,2009,18(6):2209-2212. 被引量：1
7在Flash中显示日期和时间[J].软件指南,2003(6):53-53.
8彭静,廖乐健,翟英,仇晶.谱聚类在社团发现中的应用[J].北京理工大学学报,2016,36(7):701-705. 被引量：1
9廖建平,王卫民.基于新直觉模糊相似度量的直觉模糊谱聚类算法[J].科技通报,2015,31(4):222-226. 被引量：2
10马腾,龙翔,冯路,骆沛,吴壮志.点云模型的谱聚类分割[J].计算机辅助设计与图形学学报,2012,24(12):1549-1558. 被引量：14

软件学报

2015年第11期

浏览历史

内容加载中请稍等...

求解大规模谱聚类的近似加权核k-means算法被引量：30

参考文献23

同被引文献244

引证文献30

二级引证文献229

相关作者

相关机构

相关主题

浏览历史

求解大规模谱聚类的近似加权核k-means算法 被引量：30

参考文献23

同被引文献244

引证文献30

二级引证文献229

相关作者

相关机构

相关主题

浏览历史

求解大规模谱聚类的近似加权核k-means算法被引量：30