一种改进的动态k-均值聚类算法被引量：8

Research and Realization of a Web Information Extraction and Knowledge Presentation System

下载PDF

导出

摘要针对经典k-均值聚类方法只能处理静态数据聚类的问题,本文提出一种能够处理动态数据的改进动态k-均值聚类算法,称为Dynamical K-means算法.该方法在经典k-均值方法的基础上,通过对动态变化的数据集中新加入样本进行分析和处理,根据聚类目标函数改变的实际情况选择最相似的类别进行局部更新或进行全局经典k-均值聚类,有效检测发生聚类概念漂移和没有发生聚类概念漂移的情况,从而实现了动态数据的在线聚类,避免了经典k-均值方法在动态数据中每次都要对全部数据重新聚类而导致算法速度过慢的问题.标准数据集和人工社会网络数据集上的实验结果表明,与经典k-均值聚类方法相比,本文提出的动态k-均值聚类方法能快速高效地处理动态数据聚类问题,并有效地检测动态数据聚类过程中所产生的概念漂移问题. This paper presents an improved dynamical k-means clustering model to solve the dynamical problem, called Dynamical K-means algorithm, in order to solve the problem that only solving the constant clustering problems of classical k-means clustering method. Based on classical k-means method, by analysis and solving the new adding samples of dynamical training data set, local renew or global clustering is performed by the changing range of objective function, and the dynamical data are clustered ohline. The speed of classical k-means algorithm is slow by the.reiterative clustering is needed of every online clustering step, but the speed of Dynamical K-means algorithm is accelerated. Simulation results on standard and artificial social network datasets demonstrate that comparing with classical k-means clustering means, the excellent clustering results can be obtained by this method and the concept drifting phenomenon can be monitored efficiently.

作者胡伟

机构地区山西财经大学实验教学中心

出处《计算机系统应用》 2013年第5期116-121,共6页 Computer Systems & Applications

关键词 K-均值聚类动态k-均值算法动态数据概念漂移 K-means clustering dynamical K-means algorithm dynamical data concept drifting

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献16

1http://www.zdnet.com.cn/files/mail_con.php?mid= 1735,2011, 7.
2Jain AK,Murty MN,Flynn PJ.Data clustering:a review.ACM Computing Surveys, 1999,31 (3):264-323.
3MacQueen J.Some methods for classification and analysis of multivariate observations.Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Ber- keley, 1967,1:281-297.
4Kaufman L,Peter JR. Finding groups in data:an introduction to cluster analysis.Washington:John Wiley & Sons, 1990.
5Ng RT, Han JW.Efficient and effective clustering methods for spatial data mining.Proceedings of the 20th International Conference on Very Large Data Bases (VLDB1994),Santiago, 1994:144-145.
6Cilibrasi RL,Vittnyi PM.A fast quartet tree heuristic for hierarchical clustering.Pattern recognition,2011,44(3):662- 677.
7白旭,靳志军.K-中心点聚类算法优化模型的仿真研究[J].计算机仿真,2011,28(1):218-221. 被引量：10
8Ester M,Kriegel HP, Sander J.A density-based algorithm fordiscovering clusters in large spatial databases with noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD1996),Portland,Oregon, 1996:125-138.
9武佳薇,李雄飞,孙涛,李巍.邻域平衡密度聚类算法[J].计算机研究与发展,2010,47(6):1044-1052. 被引量：22
10Su MC,Chou CH.A modified version of the k-means algorithm with distance based on cluster symmetry.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001,23 (6):674-680.

二级参考文献45

1倪巍伟,孙志挥,陆介平.k-LDCHD——高维空间k邻域局部密度聚类算法[J].计算机研究与发展,2005,42(5):784-791. 被引量：18
2赵国富,曲国庆.聚类分析中CLARA算法的分析与实现[J].山东理工大学学报（自然科学版）,2006,20(2):45-48. 被引量：9
3赵东东,宗瑜,江贺,张宪超.一种多空间聚类算法[J].小型微型计算机系统,2006,27(12):2297-2300. 被引量：6
4[1]R J Hathaway,J C Bezdek,Y K Hu.Generalized fuzzy C-means clustering strategies using LP norm distances.IEEE Trans on Fuzzy Systems,2000,8(5):576-582
5[2]U Kaymak,M Setne.Fuzzy clustering with volume prototypes and adaptive cluster merging.IEEE Trans on Fuzzy Systems,2002,10(6):706-712
6[3]M S Yang,K L Wu,J Yu.A novel fuzzy clustering algorithm.In:Proc of the 2003 IEEE Int'l Symp on Computational Intelligence in Robotics and Automation.Piscataway,NJ:IEEE Press,2003.647-652
7[4]B Bakker,T Heskes.Model clustering by deterministic annealing.In:Proc of ESANN.Bruges:D-Facto Public,1999.87-92
8[5]L I Kuncheva,C Whitaker.Measures of diversity in classifier ensembles.Machine Learning,2003,51(2):181-207
9[6]Matti Aksela,Jorma Laaksonen.Using diversity of errors for selecting members of a committee classifier.Pattern Recognition,2006,39(4):608-623
10[7]Giorgio Giacinto,Fabio Roli.Design of effective neural network ensembles for image classification purposes.Image and Vision Computing,2001,19(9-10):699-707

共引文献135

1毛颖颖,杨新凯.融合拓扑势的自适应层次聚类算法研究[J].计算机应用研究,2020,37(S01):37-39.
2张勇,倪巍伟,崇志宏,胡新平.基于邻域相关性的面向聚类数据扰动方法[J].计算机研究与发展,2011,48(S3):79-85. 被引量：1
3常瑞花.基于密集度量元的近邻传播聚类算法[J].微电子学与计算机,2015,32(5):1-5. 被引量：1
4李凯,陈武.基于集成学习的入侵检测方法[J].计算机工程,2008,34(11):166-167.
5肖宇,于剑.基于近邻传播算法的半监督聚类[J].软件学报,2008,19(11):2803-2813. 被引量：165
6陈晓峰,王士同,曹苏群.基于半监督学习的核信任力传播聚类算法[J].江南大学学报（自然科学版）,2008,7(5):505-510.
7吴毓龙,袁平波.密度敏感的距离测度在特定图像聚类中的应用[J].计算机工程,2009,35(6):210-212. 被引量：2
8李琳娜,陈海蕊,王映龙.基于高阶逻辑的复杂结构数据半监督聚类[J].计算机科学,2009,36(9):196-200.
9李昆仑,曹铮,曹丽苹,张超,刘明.半监督聚类的若干新进展[J].模式识别与人工智能,2009,22(5):735-742. 被引量：50
10王娜,李霞.基于监督信息特性的主动半监督谱聚类算法[J].电子学报,2010,38(1):172-176. 被引量：33

同被引文献77

1江小平,李成华,向文,张新访,颜海涛.k-means聚类算法的MapReduce并行化实现[J].华中科技大学学报（自然科学版）,2011,39(S1):120-124. 被引量：79
2孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量：1065
3陈黎飞,姜青山,王声瑞.基于层次划分的最佳聚类数确定方法[J].软件学报,2008,19(1):62-72. 被引量：82
4Jonathan A S, Elaine R F, Rodrigo C B, et al: Data stream clustering: a survey[J]. ACM Computing Surveys, 2013, 46(1): 13:1-13:31.
5Shifei D, Fulin W, Jun Q, et al: Research on data stream clustering algorithms[J]. Artificial Intelligence Review, 2013, 43(4): 593-600.
6Tian Z, Raghu R, and Miron L. BIRCH: an efficient data clustering method for very large databases[C]. Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, USA, 1996: 103-114.
7Aggarwal C C, Han J, and Yu P S. A framework for clustering evolving data streams[C]. Proceedings of the 29th Conference on Very Large Data Bases, Berlin, Germany, 2003 81-92.
8Chen Y and Tu L. Density-based clustering for real-time stream data[C]. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 2007: 133-142.
9Cao F, Ester M, Qian W, et al: Density-based clustering over an evolving data stream with noise[C]. Proceedings of the 16th SIAM International Conference on Data Mining, Maryland, USA, 2006: 328-339.
10Ackermann M R, M:rtens M, Raupach C, et al: StreamKM ++: a clustering algorithm for data streams[J]. Journal of Experimental Algorithmics, 2012, 17(1): 2-4.

引证文献8

1孙力娟,陈小东,韩崇,郭剑.一种新的数据流模糊聚类方法[J].电子与信息学报,2015,37(7):1620-1625. 被引量：22
2万静,张义,何云斌,李松.基于KD-树和K-means动态聚类方法研究[J].计算机应用研究,2015,32(12):3590-3595. 被引量：16
3李兰英,董义明,孔银,周秋丽.改进K-means算法的MapReduce并行化研究[J].哈尔滨理工大学学报,2016,21(1):31-35. 被引量：7
4陈小东,孙力娟,韩崇,郭剑.基于模糊聚类的数据流概念漂移检测算法[J].计算机科学,2016,43(4):219-223. 被引量：4
5吴陈,孙宏.一种对数据流进行聚类的改进算法[J].电子设计工程,2017,25(22):23-25. 被引量：1
6尹倩.基于簇特征的球员跑动大规模数据聚类研究[J].常州工学院学报,2017,30(6):35-39.
7侯方行,周庆华.基于改进指纹聚类的WLAN定位优化方法[J].电讯技术,2018,58(11):1339-1344. 被引量：3
8李秋贤,胡钰,周全兴,周国华.基于K-匿名的数据隐私社交网络保护方案[J].现代信息科技,2022,6(9):89-91. 被引量：1

二级引证文献54

1殷秀颜,陈婕,郑学青,于姝.大型供电企业电力营销管理总体策略优化研究[J].自动化与仪器仪表,2019(2):39-42. 被引量：9
2刘竹松,陈洁.考虑数据不确定性的非均匀挖掘算法[J].华侨大学学报（自然科学版）,2016,37(3):308-311. 被引量：2
3周文振,陈国良,杜珊珊,李飞.一种聚类改进的迭代最近点配准算法[J].激光与光电子学进展,2016,53(5):196-202. 被引量：11
4叶李.传感器网络时间序列数据的事件分类研究[J].重庆邮电大学学报（自然科学版）,2016,28(3):421-425. 被引量：1
5毕安琪,王士同.基于Kullback-Leiber距离的迁移仿射聚类算法[J].电子与信息学报,2016,38(8):2076-2084. 被引量：17
6郭晨晨,朱红康.基于Hadoop MapReduce和粗粒度并行遗传算法的大数据聚类方法改进[J].黑龙江大学工程学报,2016,7(3):87-91. 被引量：3
7吴跃波,方捷.基于模糊ART神经网络的雷达信号测向数据聚类方法[J].电信技术研究,2016,0(3):27-35.
8徐爱萍,王波,徐武平.HBase中基于时空特征的监测视频大数据关联查询研究[J].计算机应用研究,2017,34(5):1423-1427. 被引量：4
9单冬红,史永昌,赵伟艇,张敬普.面向云数据安全存储的分段融合模糊聚类算法[J].计算机科学,2017,44(5):166-169. 被引量：9
10于海鹏,李宜晨.一种面向大数据的快速自动聚类算法[J].河南工程学院学报（自然科学版）,2017,29(2):62-66. 被引量：1

1古凌岚,彭利民.基于相对密度和流形上k近邻的聚类算法[J].计算机科学,2016,43(12):213-217. 被引量：2
2张淑芬,董岩岩.基于Hadoop平台的气象数据聚类研究与实现[J].信息系统工程,2016,29(10):123-123.
3安爱芬.一种加速的k-均值聚类方法[J].韶关学院学报,2012,33(12):15-18. 被引量：1
4张莉,孙钢,郭军.基于K-均值聚类的无监督的特征选择方法[J].计算机应用研究,2005,22(3):23-24. 被引量：29
5孙德山,李海清.基于线性规划的支持向量聚类算法[J].计算机工程与设计,2010,31(6):1305-1307. 被引量：2
6王小乐,刘青宝,陆昌辉,侯东风.一种最小生成树聚类算法[J].小型微型计算机系统,2009,30(5):877-882. 被引量：10
7田小平,史鹏敏,吴成茂.基于粒子群的Vague均值聚类分割算法[J].西安邮电大学学报,2015,20(6):61-65. 被引量：2
8龚劬,姚玉敏.基于分水岭和改进的模糊聚类图像分割[J].计算机应用研究,2011,28(12):4773-4775. 被引量：11
9黄宇,付琨,吴一戎.基于Markov随机场K-Means图像分割算法[J].电子学报,2009,37(12):2700-2704. 被引量：21
10李艳灵,沈轶.基于空间邻域信息的FCM图像分割算法[J].华中科技大学学报（自然科学版）,2009,37(6):56-59. 被引量：21

计算机系统应用

2013年第5期

浏览历史

内容加载中请稍等...

一种改进的动态k-均值聚类算法被引量：8

参考文献16

二级参考文献45

共引文献135

同被引文献77

引证文献8

二级引证文献54

相关作者

相关机构

相关主题

浏览历史

一种改进的动态k-均值聚类算法 被引量：8

参考文献16

二级参考文献45

共引文献135

同被引文献77

引证文献8

二级引证文献54

相关作者

相关机构

相关主题

浏览历史

一种改进的动态k-均值聚类算法被引量：8