期刊文献+

一种改进的动态k-均值聚类算法 被引量:8

Research and Realization of a Web Information Extraction and Knowledge Presentation System
下载PDF
导出
摘要 针对经典k-均值聚类方法只能处理静态数据聚类的问题,本文提出一种能够处理动态数据的改进动态k-均值聚类算法,称为Dynamical K-means算法.该方法在经典k-均值方法的基础上,通过对动态变化的数据集中新加入样本进行分析和处理,根据聚类目标函数改变的实际情况选择最相似的类别进行局部更新或进行全局经典k-均值聚类,有效检测发生聚类概念漂移和没有发生聚类概念漂移的情况,从而实现了动态数据的在线聚类,避免了经典k-均值方法在动态数据中每次都要对全部数据重新聚类而导致算法速度过慢的问题.标准数据集和人工社会网络数据集上的实验结果表明,与经典k-均值聚类方法相比,本文提出的动态k-均值聚类方法能快速高效地处理动态数据聚类问题,并有效地检测动态数据聚类过程中所产生的概念漂移问题. This paper presents an improved dynamical k-means clustering model to solve the dynamical problem, called Dynamical K-means algorithm, in order to solve the problem that only solving the constant clustering problems of classical k-means clustering method. Based on classical k-means method, by analysis and solving the new adding samples of dynamical training data set, local renew or global clustering is performed by the changing range of objective function, and the dynamical data are clustered ohline. The speed of classical k-means algorithm is slow by the.reiterative clustering is needed of every online clustering step, but the speed of Dynamical K-means algorithm is accelerated. Simulation results on standard and artificial social network datasets demonstrate that comparing with classical k-means clustering means, the excellent clustering results can be obtained by this method and the concept drifting phenomenon can be monitored efficiently.
作者 胡伟
出处 《计算机系统应用》 2013年第5期116-121,共6页 Computer Systems & Applications
关键词 K-均值聚类 动态k-均值算法 动态数据 概念漂移 K-means clustering dynamical K-means algorithm dynamical data concept drifting
  • 相关文献

参考文献16

  • 1http://www.zdnet.com.cn/files/mail_con.php?mid= 1735,2011, 7.
  • 2Jain AK,Murty MN,Flynn PJ.Data clustering:a review.ACM Computing Surveys, 1999,31 (3):264-323.
  • 3MacQueen J.Some methods for classification and analysis of multivariate observations.Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Ber- keley, 1967,1:281-297.
  • 4Kaufman L,Peter JR. Finding groups in data:an introduction to cluster analysis.Washington:John Wiley & Sons, 1990.
  • 5Ng RT, Han JW.Efficient and effective clustering methods for spatial data mining.Proceedings of the 20th International Conference on Very Large Data Bases (VLDB1994),Santiago, 1994:144-145.
  • 6Cilibrasi RL,Vittnyi PM.A fast quartet tree heuristic for hierarchical clustering.Pattern recognition,2011,44(3):662- 677.
  • 7白旭,靳志军.K-中心点聚类算法优化模型的仿真研究[J].计算机仿真,2011,28(1):218-221. 被引量:10
  • 8Ester M,Kriegel HP, Sander J.A density-based algorithm fordiscovering clusters in large spatial databases with noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD1996),Portland,Oregon, 1996:125-138.
  • 9武佳薇,李雄飞,孙涛,李巍.邻域平衡密度聚类算法[J].计算机研究与发展,2010,47(6):1044-1052. 被引量:22
  • 10Su MC,Chou CH.A modified version of the k-means algorithm with distance based on cluster symmetry.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001,23 (6):674-680.

二级参考文献45

  • 1倪巍伟,孙志挥,陆介平.k-LDCHD——高维空间k邻域局部密度聚类算法[J].计算机研究与发展,2005,42(5):784-791. 被引量:18
  • 2赵国富,曲国庆.聚类分析中CLARA算法的分析与实现[J].山东理工大学学报(自然科学版),2006,20(2):45-48. 被引量:9
  • 3赵东东,宗瑜,江贺,张宪超.一种多空间聚类算法[J].小型微型计算机系统,2006,27(12):2297-2300. 被引量:6
  • 4[1]R J Hathaway,J C Bezdek,Y K Hu.Generalized fuzzy C-means clustering strategies using LP norm distances.IEEE Trans on Fuzzy Systems,2000,8(5):576-582
  • 5[2]U Kaymak,M Setne.Fuzzy clustering with volume prototypes and adaptive cluster merging.IEEE Trans on Fuzzy Systems,2002,10(6):706-712
  • 6[3]M S Yang,K L Wu,J Yu.A novel fuzzy clustering algorithm.In:Proc of the 2003 IEEE Int'l Symp on Computational Intelligence in Robotics and Automation.Piscataway,NJ:IEEE Press,2003.647-652
  • 7[4]B Bakker,T Heskes.Model clustering by deterministic annealing.In:Proc of ESANN.Bruges:D-Facto Public,1999.87-92
  • 8[5]L I Kuncheva,C Whitaker.Measures of diversity in classifier ensembles.Machine Learning,2003,51(2):181-207
  • 9[6]Matti Aksela,Jorma Laaksonen.Using diversity of errors for selecting members of a committee classifier.Pattern Recognition,2006,39(4):608-623
  • 10[7]Giorgio Giacinto,Fabio Roli.Design of effective neural network ensembles for image classification purposes.Image and Vision Computing,2001,19(9-10):699-707

共引文献135

同被引文献77

  • 1江小平,李成华,向文,张新访,颜海涛.k-means聚类算法的MapReduce并行化实现[J].华中科技大学学报(自然科学版),2011,39(S1):120-124. 被引量:79
  • 2孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1065
  • 3陈黎飞,姜青山,王声瑞.基于层次划分的最佳聚类数确定方法[J].软件学报,2008,19(1):62-72. 被引量:82
  • 4Jonathan A S, Elaine R F, Rodrigo C B, et al: Data stream clustering: a survey[J]. ACM Computing Surveys, 2013, 46(1): 13:1-13:31.
  • 5Shifei D, Fulin W, Jun Q, et al: Research on data stream clustering algorithms[J]. Artificial Intelligence Review, 2013, 43(4): 593-600.
  • 6Tian Z, Raghu R, and Miron L. BIRCH: an efficient data clustering method for very large databases[C]. Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, USA, 1996: 103-114.
  • 7Aggarwal C C, Han J, and Yu P S. A framework for clustering evolving data streams[C]. Proceedings of the 29th Conference on Very Large Data Bases, Berlin, Germany, 2003 81-92.
  • 8Chen Y and Tu L. Density-based clustering for real-time stream data[C]. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 2007: 133-142.
  • 9Cao F, Ester M, Qian W, et al: Density-based clustering over an evolving data stream with noise[C]. Proceedings of the 16th SIAM International Conference on Data Mining, Maryland, USA, 2006: 328-339.
  • 10Ackermann M R, M:rtens M, Raupach C, et al: StreamKM ++: a clustering algorithm for data streams[J]. Journal of Experimental Algorithmics, 2012, 17(1): 2-4.

引证文献8

二级引证文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部