期刊文献+

基于自适应Nystrm采样的大数据谱聚类算法 被引量:26

Spectral Clustering Algorithm Based on Adaptive Nystrm Sampling for Big Data Analysis
下载PDF
导出
摘要 面对结构复杂的数据集,谱聚类是一种灵活而有效的聚类方法,它基于谱图理论,通过将数据点映射到一个由特征向量构成的低维空间,优化数据的结构,得到令人满意的聚类结果.但在谱聚类的过程中,特征分解的计算复杂度通常为O(n3),限制了谱聚类算法在大数据中的应用.Nystrm扩展方法利用数据集中的部分抽样点,进行近似计算,逼近真实的特征空间,可以有效降低计算复杂度,为大数据谱聚类算法提供了新思路.抽样策略的选择对Nystrm扩展技术至关重要,设计了一种自适应的Nystrm采样方法,每个数据点的抽样概率都会在一次采样完成后及时更新,而且从理论上证明了抽样误差会随着采样次数的增加呈指数下降.基于自适应的Nystrm采样方法,提出一种适用于大数据的谱聚类算法,并对该算法的可行性和有效性进行了实验验证. Spectral clustering is a flexible and effective clustering method for complex structure data sets. It is based on spectral graph theory and can produce satisfactory clustering results by mapping the data points into a low-dimensional space constituted by eigenvectors so that the data structure is optimized. But in the process of spectral clustering, the computational complexity of eigen-decomposition is usually O(n3), which limits the application of spectral clustering algorithm in big data problems. Nystrom extension method uses partial points sampled from the data set and approximate calculation to simulate the real eigenspace. In this way, the computational complexity can be effectively reduced, which provides a new idea for big data spectral clustering algorithm. The selection of sampling strategy is essential for Nystrom extension technology. In this paper, the design of an adaptive Nystrom sampling method is presented. The sampling probability of every data point will be updated after each sampling pass, and a proof is given that the sampling error will decrease exponentially with the increase of sample times. Based on the adaptive Nystrom sampling method, a spectral clustering algorithm for big data analysis is presented, and its feasibility and effectiveness is verified by experiments.
出处 《软件学报》 EI CSCD 北大核心 2014年第9期2037-2049,共13页 Journal of Software
基金 国家重点基础研究发展计划(973)(2013CB329502) 国家自然科学基金(61379101)
关键词 大数据 谱聚类 特征分解 Nystrom扩展 自适应采样 big data spectral clustering eigen-decomposition Nystrom extension adaptive sampling
  • 相关文献

参考文献20

  • 1Sun JG, Liu J, Zhao LY. Clustering algorithms research. Ruan Jian Xue Ban/Journal of Software, 2008,19(1):48-61 (in Chinese with English abstract), http://www.jos.org.cn/1000-9825/19/48.htm [doi: 10.3724/SP.J.1001.2008.00048].
  • 2Ding SF, Jia HJ, Zhang LW, Jin FX. Research of semi-supervised spectral clustering algorithm based on pairwise constraints. Neural Computing and Applications, 2014,24(1):211-219. [doi: 10.1007/s00521-012-1207-8].
  • 3Chert XL, Deng C. Large scale spectral clustering with landmark-based representation. In: Proc. of the 25th AAAI Conf. on Artificial Intelligence. 2011.313-318.
  • 4Song YQ, Chen WY, Bai HJ, Lin C J, Chang EY. Parallel spectral clustering. Machine Learning and Knowledge Discovery in Databases, 2008, 5212:374-389. [doi: 10.1007/978-3-540-87481-2_25].
  • 5Yan DH, Huang L, Jordan MI. Fast approximate spectral clustering. In: Proc. of the 15th ACM Conf. on Knowledge Discovery and Data Mining (SIGKDD). 2009. 907-916. [doi: 10.1145/1557019.1557118].
  • 6Lin F, Cohen WW. Power iteration clustering. In: Proc. of the Int'l Conf. on Machine Learning. 2010. 655-662.
  • 7Li M, Kwok JT, Lu BL. Making large-scale Nystr6m approximation possible. In: Proc. of the Int'l Conf. on Machine Learning. 2010. 631-638.
  • 8Williams CKI, Seeger M. Using the Nystr6m method to speed up kernel machines. In: Proc. of the Advances in Neural Information Processing Systems 13. 2001. 682-688.
  • 9Fowlkes C, Belongie S, Chung F, Malik J. Spectral grouping using the Nystr6m method. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2004,26:214-225. [doi: 10.1109/TPAMI.2004.1262185].
  • 10Kumar S, Mohri M, Talwalkar A. Ensemhle Nystr6m method. In: Proc. of the Advances in Neural Information Processing Systems. 2009. 1060-1068.

同被引文献192

引证文献26

二级引证文献105

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部