基于随机取样的选择性K-means聚类融合算法被引量：4

Selective K-means clustering ensemble based on random sampling

下载PDF

导出

摘要由于缺少数据分布、参数和数据类别标记的先验信息,部分基聚类的正确性无法保证,进而影响聚类融合的性能;而且不同基聚类决策对于聚类融合的贡献程度不同,同等对待基聚类决策,将影响聚类融合结果的提升。为解决此问题,提出了基于随机取样的选择性K-means聚类融合算法(RS-KMCE)。该算法中的随机取样策略可以避免基聚类决策选取陷入局部极小,而且依据多样性和正确性定义的综合评价值,有利于算法快速收敛到较优的基聚类子集,提升融合性能。通过2个仿真数据库和4个UCI数据库的实验结果显示:RS-KMCE的聚类性能优于K-means算法、K-means融合算法(KMCE)以及基于Bagging的选择性K-means聚类融合(BA-KMCE)。 Without any prior information about data distribution, parameter and the labels of data, not all base clustering results can truly benefit for the combination decision of clustering ensemble. In addition, if each base clustering plays the same role, the performance of clustering ensemble may be weakened. This paper proposed a selective K-means clustering ensemble based on random sampling, called RS-KMCE. In RS-MKCE, random sampling can avoid local minimum in the process of selecting base clustering subset for ensemble. And the defined evaluation index according to diversity and accuracy can lead to a better base clustering subset for improving the performance of clustering ensemble. The experiment results on two synthetic datasets and four UCI datasets show that performance of the proposed RS-KMCE is better than K-means, K-means clustering ensemble, and selective K-means clustering ensemble based on bagging.

作者王丽娟郝志峰蔡瑞初温雯

机构地区华南理工大学计算机科学与工程学院广东工业大学计算机学院

出处《计算机应用》 CSCD 北大核心 2013年第7期1969-1972,共4页 journal of Computer Applications

基金国家自然科学基金资助项目(61070033 61100148 61202269) 广东省自然科学基金资助项目(S20110400 04804) 广东省科技计划项目(2010B050400011) 软件新技术国家重点实验室开放课题(KFKT2011B19) 广东高校优秀青年创新人才培育项目(LYM11060) 广州市科技计划项目(12C42111607 201200000031) 番禺区科技计划项目(2012-Z-03-67)

关键词聚类融合选择性聚类融合随机取样聚类决策评价 K-MEANS clustering ensemble selective clustering ensemble random sampling evaluation index of clustering K-means

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献15

1STREHL A, GHOSH J. Cluster ensembles - a knowledge reuse framework for combining multiple partitions [ J]. Journal of Machine Learning Research, 2002, 3:583-617.
2FERN X Z, BRODLEY E C. Cluster ensembles for high dimension- al data elustering: an empirical study, #CS06-30-02[ R]. Corval- lis, USA: Oregon State University, 2004.
3WANG T. CA-Tree: a hierarchical structure for efficient and scala- ble coassociation-based cluster ensembles [ J]. IEEE Transactions on Systems, Man, and Cybernetics--Part B, 2011, 41(3): 686 - 698.
4IAM-ON N, BOONGOEN T, GARRETF S, eta/. A link-based ap- proach to the cluster ensemble problem [ J]. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2011, 33(12): 2396-2409.
5FRED A L N, JAIN A K. Combining multiple clusterings using evi- dence accumulation [ J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850.
6GIONIS A, MANNILA H, TSAPARAS P. Clustering aggregation [C]// ICDE '05: Proceeding of 2005 IEEE International Confer- ence on Data Engineering. Piscataway: IEEE, 2005:341-352.
7KUNCHEVA L I, VETROV D P. Evaluation of stability of k-means cluster ensembles with respect to random initialization [ J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(11): 1798-1808.
8FISCHER B, BUHMANN J M. Bagging for path-based clustering [ J]. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2003, 25(11) : 1411 - 1415.
9MINAEI-BIDGOLI B, TOPCHY A, PUNCH W. A comparison of resampling methods for clustering ensembles [ C]// Proceeding of 2004 International Conference on Machine Learning: Models, Tech- nologies, and Applications. l_as Vegas: CSREA Press, 2004:939 - 945.
10FERN X Z, BRODLEY E C. Random projection for high dimen- sional data clustering: a cluster ensemble approach [ C]// ICML 2003: Proceedings of the 20th Internatianal Conference on Machine learning. Washington, DC: AAAI Press, 2003: 186-193.

同被引文献30

1张振亚,王进,程红梅,王煦法.基于余弦相似度的文本空间索引方法研究[J].计算机科学,2005,32(9):160-163. 被引量：54
2孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量：1074
3A.Strehl,J.Ghosh.Cluster ensembles-a knowledge reuse framework for combining multiple partitions[J].Journal of Machine Learning Research,2002,3(1):583-617.
4A.Fred,Anil K Jain.Combining multiple clusterings using evidence accumulation[J].Pattern Analysis and Machine Intelligence,2005,27(6):835-850.
5Alexander Topchy,Anil K Jain,William Punch.Clustering ensembles:models of consensus and weak partitions[J].Pattern Analysis and Machine Intelligence,2005,27(12):1866-1881.
6S.T.Hadjitodorov,L.Kuncheva,LP Todorova.Moderate diversity for better cluster ensembles[J].Information Fusion,2006,7(3):264-275.
7L.Kuncheva,D.Vetrov.Evaluation of stability of k-means cluster ensembles with respect to random initialization[J].Pattern Analysis and Machine Intelligence,2006,28(11):1798-1808.
8Nam Nguyen,Rich Caruana.Consensus clusterings[C]//Proceeding of IEEE 13th International Conference on Data Mining,2007:607-612.
9N.Iam-On,T.Boongoen,S.Garrett,et al.A link-based approach to the cluster ensemble problem[J].Pattern Analysis and Machine Intelligence,2011,33(12):2396-2409.
10Hongjun Wang,Hanhuai Shan,Arindam Banerjee.Bayesian cluster ensembles[J].Statistical Analysis and Data Mining:The ASA Data Science Journal,2011,4(1):5470-5471.

引证文献4

1田腾浩.优化初始聚类中心的K-Means算法[J].网络安全技术与应用,2014(9):42-43. 被引量：3
2梁荣德,刘波.聚类融合算法的实验评价方法[J].无线互联科技,2015,12(7):127-130.
3张颖怡,章成志,陈果.基于关键词的学术文本聚类集成研究[J].情报学报,2019,38(8):860-871. 被引量：15
4王巧玲,乔非,蒋友好.基于聚合距离参数的改进K-means算法[J].计算机应用,2019,39(9):2586-2590. 被引量：27

二级引证文献45

1滑江,孙钰,周彦斌,蔡曙日,龚尚文.基于K-means方法的气象数据分区在公路养护的应用[J].公路交通科技,2022,39(S01):19-23. 被引量：1
2倪志恒,杨盛菁.我国“养老服务”研究热点分析——基于文献计量方法[J].广西质量监督导报,2021(3):23-24.
3余苏毅.从电子邮件记录文件侦测异常使用行为[J].西安文理学院学报（自然科学版）,2018,21(6):60-63.
4杨红,李丹宁,王雅洁.基于离群点检测(LOF)的K-means算法[J].通信技术,2019,52(8):1884-1888. 被引量：7
5任恒妮.大数据K-means聚类算法的研究与应用[J].信息技术,2019,43(11):20-23. 被引量：11
6李为康,杨小兵.一种改进的RFM模型在网店客户细分中的应用[J].中国计量大学学报,2020,31(1):85-91. 被引量：4
7龚旭,吕佳,皮家甜.结合信息增益率和K-means聚类的协同训练算法[J].重庆师范大学学报（自然科学版）,2020,37(2):112-119. 被引量：4
8耿宏,何卫东,冯晓.基于改进K-means算法的WiFi室内定位方法研究[J].测绘,2020,43(1):15-19.
9向红伟,常喜强,吕梦琳,邢占礼,王晗.考虑光、储、燃联合发电的微电网优化运行[J].哈尔滨理工大学学报,2020,25(2):73-79. 被引量：9
10朱光,刘蕾,李凤景.基于LDA和LSTM模型的研究主题关联与预测研究——以隐私研究为例[J].现代情报,2020,40(8):38-50. 被引量：23

1吴晓璇,倪志伟,倪丽萍,张琛.基于互信息和分形维数相结合的选择性聚类融合算法研究[J].模式识别与人工智能,2014,27(9):847-855. 被引量：1
2王林春.基于先验知识类Haar特征的行人检测[J].江苏科技信息,2015,32(23):59-60.
3刘丽敏,樊晓平,廖志芳.选择性聚类融合研究进展[J].计算机工程与应用,2012,48(10):1-5. 被引量：3
4季少石.模式识别中两种相似性测度算法比较[J].甘肃科技,2011,27(9):18-19. 被引量：1
5刘丽敏,樊晓平,廖志芳.选择性聚类融合新方法研究[J].计算机应用研究,2012,29(11):4031-4034. 被引量：4
6姚佳宝,田秋红,陈本永.一种基于L-M算法的RANSAC图像拼接算法[J].浙江理工大学学报（自然科学版）,2015,33(4):552-557. 被引量：3
7刘成俊,曾慧娥,任蜀炎,陈祥伟.压缩机维修管理决策支持系统应用开发[J].重庆科技学院学报：自然科学版,2008,10(5):114-116.
8郝泽东,余淞淞,关佶红.基于主动学习的高光谱图像分类方法[J].计算机应用,2013,33(12):3441-3443. 被引量：2
9高尚.武器-目标分配问题的分布估计算法及参数设计[J].东南大学学报（自然科学版）,2012,42(A01):178-181. 被引量：8
10赵建周,张勇.智能型随机取样系统设计[J].电气自动化,2006,28(4):66-68.

计算机应用

2013年第7期

浏览历史

内容加载中请稍等...

基于随机取样的选择性K-means聚类融合算法被引量：4

参考文献15

同被引文献30

引证文献4

二级引证文献45

相关作者

相关机构

相关主题

浏览历史

基于随机取样的选择性K-means聚类融合算法 被引量：4

参考文献15

同被引文献30

引证文献4

二级引证文献45

相关作者

相关机构

相关主题

浏览历史

基于随机取样的选择性K-means聚类融合算法被引量：4