期刊文献+

图松弛优化聚类的快速近似提升方法 被引量:1

Fast Approximate Approaches for Graph-Based Relaxed Optimization Clustering
下载PDF
导出
摘要 基于图松弛优化为非近似迭代方法提供了有效的分析解决方案,且实现简单。然而,由于矩阵的逆在计算时需要多项式时间,则在运行速度方面不是很理想,当面对较大规模数据时此方法将变得不可行。提出了对基于图松弛优化聚类进行快速近似提升的两种方法:一个是基于k均值聚类,另一个是基于随机投影树。广泛实验表明,这些算法在运算速度方面表现较优,聚类精度变化非常小。具体来讲,该算法在运算大规模数据时精度优于k均值算法,并且在保证精度的情况下运行速度远快于基于图松弛优化聚类算法。值得注意的是,该算法可以使得单个机器在数分钟内对具有数百万样本的数据集进行聚类。 Due to its easy implementation,the graph-based relaxed optimization indeed provides an effective analytical solution for non-approximation iterative methods.However,due to the inverse of the matrix,such an optimization will run slowly and even become impractical for large-scale data.This paper develops two general approaches for fast graph-based relaxed optimization clustering.One is based on k-means clustering,and the other is based on random projection tree.Extensive experiments show that these two proposed approaches can achieve significant acceleration without degrading the clustering accuracy a lot.In particular,the approaches have better clustering performance than the classical k-mean algorithm on large-scale data,and run faster than the graph-based relaxed optimization clustering algorithms,with comparable accuracy.It is worth noting that the proposed approaches in this paper allow a single machine to cluster millions of data samples within minutes.
作者 谢磊 王士同 XIE Lei;WANG Shitong(School of Digital Media,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处 《计算机科学与探索》 CSCD 北大核心 2018年第4期642-652,共11页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金 No.61170122~~
关键词 无监督学习 基于图松弛优化聚类 数据量化 高维数据 快速近似 unsupervised learning graph-based clustering data quantization high dimensional data fast approximate
  • 相关文献

参考文献3

二级参考文献22

  • 1BEZDEK J C, EHRLICH R, FULL W. FCM: the fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2): 191-203.
  • 2CAN F, DROCHAK N D II. Incremental clustering for dynamic document databases[C]//Proceedings of the 1990 Symposium on Applied Computing. Fayetteville, AR, USA, 1990: 61-67.
  • 3KAUFMAN L, ROUSSEEUW P J. Finding groups in data: an introduction to cluster analysis[M]. New York: John Wiley & Sons, 2009: 830-832.
  • 4GUHA S, RASTOGI R, SHIM K. Cure: an efficient clustering algorithm for large databases[J]. Information systems, 2001, 26(1): 35-58.
  • 5CAN F. Incremental clustering for dynamic information processing[J]. ACM transactions on information systems, 1993, 11(2): 143-164.
  • 6CAN F, FOX E A, SNAVELY C D, et al. Incremental clustering for very large document databases: Initial MARIAN experience[J]. Information sciences, 1995, 84(1/2): 101-114.
  • 7ZHANG Tian, RAMAKIRSHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases[C]//Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York, USA, 1996: 103-114.
  • 8NG R T, HAN Jiawei. CLARANS: A method for clustering objects for spatial data mining[J]. IEEE transactions on knowledge and data engineering, 2002, 14(5): 1003-1016.
  • 9SHANKER B U, PAL N R. FFCM: An effective approach for large data sets[C]//Proceedings of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing. Iizuka, Japan, 1994: 331-332.
  • 10CHENG Taiwai, GOLDGOF D B, HALL L O. Fast clustering with application to fuzzy rule generation[C]//Proceedings of 1995 IEEE International Fuzzy Systems, 1995. International Joint Conference of the Fourth IEEE International Conference on Fuzzy Systems and The Second International Fuzzy Engineering Symposium. Yokohama, Japan, 1995: 2289-2295.

共引文献23

同被引文献11

引证文献1

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部