基于自适应Nystrm采样的大数据谱聚类算法被引量：26

Spectral Clustering Algorithm Based on Adaptive Nystrm Sampling for Big Data Analysis

下载PDF

导出

摘要面对结构复杂的数据集,谱聚类是一种灵活而有效的聚类方法,它基于谱图理论,通过将数据点映射到一个由特征向量构成的低维空间,优化数据的结构,得到令人满意的聚类结果.但在谱聚类的过程中,特征分解的计算复杂度通常为O(n3),限制了谱聚类算法在大数据中的应用.Nystrm扩展方法利用数据集中的部分抽样点,进行近似计算,逼近真实的特征空间,可以有效降低计算复杂度,为大数据谱聚类算法提供了新思路.抽样策略的选择对Nystrm扩展技术至关重要,设计了一种自适应的Nystrm采样方法,每个数据点的抽样概率都会在一次采样完成后及时更新,而且从理论上证明了抽样误差会随着采样次数的增加呈指数下降.基于自适应的Nystrm采样方法,提出一种适用于大数据的谱聚类算法,并对该算法的可行性和有效性进行了实验验证. Spectral clustering is a flexible and effective clustering method for complex structure data sets. It is based on spectral graph theory and can produce satisfactory clustering results by mapping the data points into a low-dimensional space constituted by eigenvectors so that the data structure is optimized. But in the process of spectral clustering, the computational complexity of eigen-decomposition is usually O（n3）, which limits the application of spectral clustering algorithm in big data problems. Nystrom extension method uses partial points sampled from the data set and approximate calculation to simulate the real eigenspace. In this way, the computational complexity can be effectively reduced, which provides a new idea for big data spectral clustering algorithm. The selection of sampling strategy is essential for Nystrom extension technology. In this paper, the design of an adaptive Nystrom sampling method is presented. The sampling probability of every data point will be updated after each sampling pass, and a proof is given that the sampling error will decrease exponentially with the increase of sample times. Based on the adaptive Nystrom sampling method, a spectral clustering algorithm for big data analysis is presented, and its feasibility and effectiveness is verified by experiments.

作者丁世飞贾洪杰史忠植

机构地区中国矿业大学计算机科学与技术学院中国科学院计算技术研究所智能信息处理重点实验室

出处《软件学报》 EI CSCD 北大核心 2014年第9期2037-2049,共13页 Journal of Software

基金国家重点基础研究发展计划(973)(2013CB329502) 国家自然科学基金(61379101)

关键词大数据谱聚类特征分解 Nystrom扩展自适应采样 big data spectral clustering eigen-decomposition Nystrom extension adaptive sampling

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献20

1Sun JG, Liu J, Zhao LY. Clustering algorithms research. Ruan Jian Xue Ban/Journal of Software, 2008,19(1):48-61 (in Chinese with English abstract), http://www.jos.org.cn/1000-9825/19/48.htm [doi: 10.3724/SP.J.1001.2008.00048].
2Ding SF, Jia HJ, Zhang LW, Jin FX. Research of semi-supervised spectral clustering algorithm based on pairwise constraints. Neural Computing and Applications, 2014,24(1):211-219. [doi: 10.1007/s00521-012-1207-8].
3Chert XL, Deng C. Large scale spectral clustering with landmark-based representation. In: Proc. of the 25th AAAI Conf. on Artificial Intelligence. 2011.313-318.
4Song YQ, Chen WY, Bai HJ, Lin C J, Chang EY. Parallel spectral clustering. Machine Learning and Knowledge Discovery in Databases, 2008, 5212:374-389. [doi: 10.1007/978-3-540-87481-2_25].
5Yan DH, Huang L, Jordan MI. Fast approximate spectral clustering. In: Proc. of the 15th ACM Conf. on Knowledge Discovery and Data Mining (SIGKDD). 2009. 907-916. [doi: 10.1145/1557019.1557118].
6Lin F, Cohen WW. Power iteration clustering. In: Proc. of the Int'l Conf. on Machine Learning. 2010. 655-662.
7Li M, Kwok JT, Lu BL. Making large-scale Nystr6m approximation possible. In: Proc. of the Int'l Conf. on Machine Learning. 2010. 631-638.
8Williams CKI, Seeger M. Using the Nystr6m method to speed up kernel machines. In: Proc. of the Advances in Neural Information Processing Systems 13. 2001. 682-688.
9Fowlkes C, Belongie S, Chung F, Malik J. Spectral grouping using the Nystr6m method. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2004,26:214-225. [doi: 10.1109/TPAMI.2004.1262185].
10Kumar S, Mohri M, Talwalkar A. Ensemhle Nystr6m method. In: Proc. of the Advances in Neural Information Processing Systems. 2009. 1060-1068.

同被引文献192

1张文杰,蒋烈辉.基于MapReduce并行化计算的大数据聚类算法[J].计算机应用研究,2020,37(1):53-56. 被引量：21
2刘友超,张曦煌.基于自然最近邻相似图的谱聚类[J].计算机应用研究,2020,37(1):30-33. 被引量：7
3杨正瓴,田勇,林孔元.短期负荷预测“双周期加混沌”法中的多步法与气象因子的使用[J].电网技术,2004,28(12):20-24. 被引量：7
4唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502. 被引量：95
5蔡忠伟,李建东.基于双谱的通信辐射源个体识别[J].通信学报,2007,28(2):75-79. 被引量：84
6田铮,李小斌,句彦伟.谱聚类的扰动分析[J].中国科学（E辑）,2007,37(4):527-543. 被引量：33
7王玲,薄列峰,焦李成.密度敏感的半监督谱聚类[J].软件学报,2007,18(10):2412-2422. 被引量：95
8孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量：1079
9Du Y,He Y,Tian Y,et al..Microblog bursty topic detection based on user relationship[C].IEEE 6th Joint International Information Technology and Artificial Intelligence Conference (ITAIC),Chongqing,China,2011,1:260-263.
10Choromanska A,Jebara T,Kim H,et al..Fast spectral clustering via the nystr?m method[C].Proceedings of the 24th International Conference,Algorithmic Learning Theory 2013,Singapore,2013:367-381.

引证文献26

1罗恩韬,王国军.大数据中一种基于语义特征阈值的层次聚类方法[J].电子与信息学报,2015,37(12):2795-2801. 被引量：8
2覃晓,梁伟,元昌安,唐涛.基于遗传优化谱聚类的图形分割方法[J].计算机科学,2017,44(1):100-102. 被引量：4
3雷景生,郝珈玮,朱国康.基于“分层-汇集”模型的短期电力负荷预测[J].电力建设,2017,38(1):68-75. 被引量：9
4叶茂,刘文芬.基于快速地标采样的大规模谱聚类算法[J].电子与信息学报,2017,39(2):278-284. 被引量：10
5江岳春,杨旭琼,陈礼锋,贺飞.基于EMD-SC和AGSA优化支持向量机的超短期风电功率组合预测[J].工程设计学报,2017,24(2):187-195. 被引量：2
6褚徐涛,王亚楠,梁木玲.一种快速谱聚类医学图像分割算法[J].现代计算机（中旬刊）,2017(8):48-50.
7李海林,邹金串.基于分类词典的文本相似性度量方法[J].智能系统学报,2017,12(4):556-562. 被引量：6
8张文军,王建平,范世平,张柳霞.基于类中心与边界自寻优的聚类算法[J].计算机系统应用,2017,26(11):118-123.
9杨美姣,刘惊雷.基于Nystrom采样和凸NMF的偏好聚类[J].计算机科学,2018,45(1):55-61. 被引量：2
10Xi-bin JIA,Ya JIN,Ning LI,Xing SU,Barry CARDIFF,Bir BHANU.Words alignment based on association rules for cross-domain sentiment classification[J].Frontiers of Information Technology & Electronic Engineering,2018,19(2):260-272. 被引量：4

二级引证文献110

1刘露,吴珏,杨雷,杨福军.基于谱聚类的Web多级缓存替换策略[J].计算机系统应用,2022,31(11):380-386. 被引量：1
2聂茹.抽样子空间约束改进大数据谱聚类算法[J].电信科学,2018,34(11):41-47.
3吴军,王龙龙.基于双鸟群混沌优化的otsu图像分割算法[J].微电子学与计算机,2018,35(12):119-124. 被引量：9
4谭涛,史佳琪,刘阳,张建华.园区型能源互联网的特征及其能量管理平台关键技术[J].电力建设,2017,38(12):20-30. 被引量：17
5付立东,聂靖靖.基于进化谱分方法的动态社团检测[J].计算机科学,2018,45(2):171-174. 被引量：2
6田淑慧,于惠钧,赵巧红,李林.基于经验模态分解的PSO-SVM风电功率短期预测[J].湖南工业大学学报,2018,32(3):59-64. 被引量：8
7费博雯,邱云飞,刘万军,刘大千.距离决策下的模糊聚类集成模型[J].电子与信息学报,2018,40(8):1895-1903. 被引量：1
8侯莉莎.大数据集合中冗余特征排除的聚类算法设计[J].现代电子技术,2018,41(14):48-50. 被引量：9
9王波,郑晓东,李晓晔,陶佰睿,杨东波,刘艳菊.用于癌症亚分型的生物医学大数据谱聚类技术研究[J].科学技术创新,2018(16):9-10. 被引量：1
10梁荣,杨波,马润泽,吴健,吴奎华,林振智,文福拴.利用多源信息和深度置信神经网络的配电系统空间负荷预测[J].电力建设,2018,39(10):12-19. 被引量：13

1张权,胡玉兰.谱聚类图像分割算法研究[J].沈阳理工大学学报,2012,31(6):87-91. 被引量：2
2李振博,徐桂琼,査九.基于Nystrm扩展谱聚类的社会化推荐算法[J].计算机应用研究,2015,32(11):3238-3241. 被引量：3
3张琳.基于增量Nystrom方法的动态学习[J].硅谷,2010,3(24):176-176.
4孙志海,孔万增.基于Nystrm密度值逼近的减法聚类[J].中国图象图形学报,2013,18(7):790-798. 被引量：2
5唐文俊,左亚尧,张波,张祖传.一种基于密度聚类Nystrom抽样算法[J].计算机工程与科学,2012,34(11):148-152. 被引量：2
6吴彦博.谱聚类广义模型和典型算法探析[J].通讯世界,2016,22(12):296-296.
7杨锋,柴毅.基于改进谱聚类与粒子群优化的图像分割算法[J].微电子学与计算机,2013,30(7):51-54. 被引量：4
8王鑫,李璐,王晓芳.基于Nystr?m谱聚类的词典学习[J].计算机工程与应用,2014,50(6):112-117. 被引量：3
9林洋,李燕,董玮,刘延昕,任丽晔.复杂网络社区的抽样概率分布估计检测算法[J].西南师范大学学报（自然科学版）,2016,41(10):96-103. 被引量：1
10李宇璞.大数据和AIDC新兴技术的应用[J].中国自动识别技术,2015,0(3):22-22.

软件学报

2014年第9期

浏览历史

内容加载中请稍等...

基于自适应Nystrm采样的大数据谱聚类算法被引量：26

参考文献20

同被引文献192

引证文献26

二级引证文献110

相关作者

相关机构

相关主题

浏览历史

基于自适应Nystrm采样的大数据谱聚类算法 被引量：26

参考文献20

同被引文献192

引证文献26

二级引证文献110

相关作者

相关机构

相关主题

浏览历史

基于自适应Nystrm采样的大数据谱聚类算法被引量：26