期刊文献+

数据挖掘中一种高效的聚类通用框架研究 被引量:2

Research on an Efficient Clustering General Framework in Data Mining
下载PDF
导出
摘要 随着传感器和互联网技术高速发展,数据集的规模激增,但系统的存储和处理能力仍然滞后。针对目前的数据聚类算法所需的测量值数目较多、时间开销大的不足,为了高效地解决大型数据集中的数据聚类问题,提出了一种主动式分层聚类通用框架,通过在小型数据集上重复运行离线聚类算法,既保证了算法性能,又降低了测量值计算复杂度和运行时间复杂度。然后,基于谱聚类算法讨论了本文框架,理论分析结果表明,利用O(n lg2n)个相似性数据可以恢复规模为Ω(lgn)的所有聚类,对包含n个对象的数据集,其运行时间为O(n lg3n)。最后,通过全面的仿真实验,证明了所提框架的其他优异性能。 Advances in sensing technologies and the growth of the internet have resuhed in an explosion in the size of datasets, while the storage and processing power continue to lag behind. Aiming at the disadvantages of required the larger number of measurements and spent the more running time at the current data clustering algorithms, in order to efficiently solve problems related to the large datasets, a general framework is proposed for active hierarchical clustering that repeatedly runs an off-the-shelf clustering algorithm on small subsets of the data and comes with guarantees on performance, measurement complexity and runtime complexity. This framework is instantiated with the spectral clustering algorithm and concrete results is provided on its performance. Theoretical analysis results show that, this algorithm recovers all clusters of size Ω(lgn) using O(n lg2n) similarities and runs in O( n lg3n) time for a dataset of n objects. This framework is practically alluring.
作者 高芹 陈亚
出处 《科学技术与工程》 北大核心 2014年第16期112-118,共7页 Science Technology and Engineering
基金 湖北理工学院校级科研项目(12xjz41Q)资助
关键词 数据集 聚类 测量值 框架 运行时间 datasets clustering measurement framework runtime
  • 相关文献

参考文献4

二级参考文献67

  • 1余仕成.大学物理实验数据处理的几个问题讨论[J].武汉化工学院学报,2005,27(1):94-96. 被引量:9
  • 2ERTOZ L, STEINBACH M, KUMAR V. Finding clusters of different sizes, shapes and densities in noisy high-dimensional data[ R]. Minnesota: Department of Computer Science, University of Minnesota, 2002.
  • 3HAM J H, LEE D D, SAUL L K. Learning high-dimensional correspondences from low dimensional manifolds [ C ]//Proc of ICML Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. Washington: [ s. n. ] , 2003:34-41.
  • 4KOHONEN T. Self-organization and associated memory [ M]. [ S. l. ]: Springer-Verlag, 1988.
  • 5KOHONEN T. Self-organizing maps [ M ]. New York: Spinger-Verlag, 2001.
  • 6MINKA T P. Automatic choice of dimensionality for PCA[ C ]//Proc of International Conference on Advances in Neural Information Processing Systems. Cambridge: [ s. n. ] , 2001:598-604.
  • 7GRIFFITHS T L, KALISH M L. A muhidimensional scaling approach to mental multiplication[ J ]. Memory & Cognition, 2002,30 ( 1 ) : 97-106.
  • 8CAMASTRA F, VINCIARELLI A. Estimating the intrinsic dimension of data with a fractal-based method [J].IEEE Trans on Pattern Anal Mach Intell, 2002,24(10) :1404-1407.
  • 9CAMASTRA F. Data dimension estimation methods: a survey[ J]. Pattern Recognition, 2003, 36:2945-2954.
  • 10SCHOLKOPF B, SMOLA A, MULLER K. Nonlinear component analysis as a kernel eigenvalue problem [ J ]. Neural Computation, 1998,10(5) :1299-1319.

共引文献81

同被引文献17

引证文献2

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部