基于数据集特点的增强聚类集成算法被引量：5

Enhanced clustering ensemble algorithm based on characteristics of data sets

下载PDF

导出

摘要当前流行的聚类集成算法无法依据不同数据集的不同特点给出恰当的处理方案,为此提出一种新的基于数据集特点的增强聚类集成算法,该算法由基聚类器的生成、基聚类器的选择与共识函数构成。该算法依据数据集的特点,通过启发式方法,选出合适的基聚类器,构建最终的基聚类器集合,并产生最终聚类结果。实验中,对ecoli,leukaemia与Vehicle三个基准数据集进行了聚类,所提出算法的聚类误差分别是0.014,0.489,0.479,同基于Bagging的结构化集成(BSEA)、异构聚类集成(HCE)和基于聚类的集成分类(COEC)算法相比,所提出算法的聚类误差始终最低;而在增加候基聚类器的情况下,所提出算法的标准化互信息(NMI)值始终高于对比算法。实验结果表明,同对比的聚类集成算法相比,所提出算法的聚类精度最高,可伸缩性最强。 The popular clustering ensemble algorithms cannot give the appropriate treatment program in the light of the different characteristics of the different data sets.A new clustering ensemble algorithm — Enhanced Clustering Ensemble algorithm based on Characteristics of Data sets（ECECD） was proposed for overcoming this defect.ECECD was composed of generation of base clustering,selection of base clustering and consensus function.It selected a special range of ensemble members to form the final ensemble and produced the final clustering based on the characteristic of the data set.Three Benchmark data sets including ecoli,leukaemia and Vehicle were clustered in the experiment,and the clustering errors gained by the proposed algorithm were 0.014,0.489 and 0.361 respectively,which were always the minimum compared with that of the other algorithms such as Bagging based Structure Ensemble Approach（BSEA）,Hybrid Cluster Ensemble（HCE） and Cluster-Oriented Ensemble Classifier（COES）.The Normalized Mutual Information（NMI） values of the proposed algorithm were also always higher than that of these algorithms when increasing candidate base clusterings.Therefore,compared with these popular clustering ensemble algorithms,the proposed algorithm has the highest clustering precision and the strongest scalability.

作者侯勇郑雪峰

机构地区北京科技大学计算机与通信工程学院山东经贸职业学院科学与人文学院

出处《计算机应用》 CSCD 北大核心 2013年第8期2204-2207,2249,共5页 journal of Computer Applications

基金山东省企业培训与职工教育课题资助项目(2012-277) 潍坊市社科规划重点课题资助项目(潍社科学术委发[2011]2号) 山东省高校人文社科研究计划项目(J08WG71)

关键词基聚类器共识函数聚类集成算法聚类误差自适应性标准化互信息 base clustering consensus function clustering ensemble algorithm clustering error adaptivity Normalized Mutual Information（NMI）

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献17

1GIOTIS I, PETKOV N. Cluster-based adaptive metric classification [ J]. Neurocomputing, 2012, 81:33 - 40.
2ANDREW S, KHALED A. Clustering sentence-level text using a novel fuzzy, relational clustering algorithm [ J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(1) : 62 -75.
3KANNAN S R, RAMATHILAGAM S, CHUNG P C, et al. Effec- tive fuzzy c-means clustering algorithms for data clustering problems [J]. Expert Systems with Application, 2012,39(7):6292-6300.
4WOLOSZYNSKI T, KURZYNSKI M, PODSIADLO P, et al. A measure of competence based on random classification for dynamic ensemble se- lection [J]. Information Fusion, 2012, 13(3):207-213.
5CHEN J C, WU C-C, CHEN C-W, et al. Flexible job shop schedu- ling with parallel machines using genetic algorithm and grouping ge- netic algorithm[J]. Expert Systems with Application, 2012, 39 (11): 10016-10021.
6KHALEGHI M, FARSANGI M M, NEZAMABADI-POUR H, et al. Pareto-optimal design of damping eontrollers using modified artificial immune algorithm [ J]. IEEE Transactions on Systems, Man and Cy- bemetics: Part C, Applications and Reviews, 2011,41 (2) : 240 - 250.
7PARTALAS I, TSOUMAKAS G, VLAHAVAS I, et al. An ensem- ble uncertainty aware measure for directed hill climbing ensemble pruning [J]. Machine Learning, 2010, 81(3) : 257 -282.
8MAHAJAN M, NIMBHORKAR P, VARADARAJAN K, et al. The planar k-means problem is NP-hard [ J]. Theoretical Computer Sci- ence, 2012, 442:13-21.
9ZHANG S, WONG H S, SHEN Y, et al. Generalized adjusted rand indices for cluster ensembles [ J]. Pattern Recognition, 2012, 45 (6) : 2214 -2226.
10QING C, JIANG J, YANG Z. Normalized information for facial pose detection inside videos [ J] Transactions on Circuits and Systems for Video Technology 20(12) : 1898 - 1902.

同被引文献42

1陈兴蜀,吴小松,王文贤,王海舟.基于特征关联度的K-means初始聚类中心优化算法[J].四川大学学报（工程科学版）,2015,47(1):13-19. 被引量：29
2王汉芝,刘振全.一种新的确定K-均值算法初始聚类中心的方法[J].天津科技大学学报,2005,20(4):76-79. 被引量：9
3张文君,顾行发,陈良富,余涛,许华.基于均值-标准差的K均值初始聚类中心选取算法[J].遥感学报,2006,10(5):715-721. 被引量：57
4Hruschka E R, CampeUo R J G B, Freitas A A, et al. A sur- vey of evolutionary algorithms for clustering [J]. IEEE Tran- sactions on Systems Man and Cybernetics Part C-Applications and Reviews, 2009, 39 (2): 133-155.
5Strehl A, Ghosh J. Cluster ensembles: A knowledge reuse framework for combining multiple partitions [J]. Journal of Machine Learning Research, 2008, 3 (3). 583-617.
6Saha S, Bandyopadhyay S. A symmetry based multiobjective clustering technique for automatic evolution of clusters [J]. Pattern Recognition, 2010, 43 (3): 738-751.
7University of California, Irvine. UCI machine learning reposito- ry [ EB/OL]. [ 2013-02-05] . http.//archive, ics. uci. edu/ml/ datasets, html.
8孙萍,蒋昌俊.利用服务聚类优化面向过程模型的语义Web服务发现[J].计算机学报,2008,31(8):1340-1353. 被引量：63
9刘琦,张引,俞荣栋,王明怡,叶修梓.高密度寡核苷酸阵列的数据标准化方法[J].浙江大学学报（工学版）,2008,42(9):1653-1660. 被引量：1
10莫宏伟,左兴权,毕晓君.人工免疫系统研究进展[J].智能系统学报,2009,4(1):21-29. 被引量：19

引证文献5

1曹萌萌,郭晓磊,刘晓斐.基于局部集成和克隆选择的多目标聚类算法[J].计算机工程与设计,2015,36(8):2234-2238. 被引量：1
2李梅莲,郭超峰.基于闻香识源的改进人工蜂群聚类算法[J].河南大学学报（自然科学版）,2017,47(5):552-559.
3李梅莲.基于密度分布的K-Means初始聚类中心选择算法[J].许昌学院学报,2017,36(2):20-24. 被引量：2
4王宏杰,师彦文.结合初始中心优化和特征加权的K-Means聚类算法[J].计算机科学,2017,44(B11):457-459. 被引量：19
5杨春红.试谈企业数据标准化体系建设[J].电脑编程技巧与维护,2019,0(12):88-90. 被引量：2

二级引证文献24

1冯志军,郭光超,刘宇昕,苗春林,赵鹏,李坤远,杜俊鹏.航天企业数据标准体系建设探讨[J].航天标准化,2023(1):19-21.
2刘荣凯,孙忠林.针对K-means初始聚类中心优化的PCA-TDKM算法[J].软件导刊,2018,17(9):85-87. 被引量：3
3刘荣凯,孙忠林.PCA-KDKM算法及其在微博舆情中的应用[J].山东科技大学学报（自然科学版）,2018,37(6):84-92. 被引量：5
4黄灵,王云锋,陈光武.基于密度标准差优化初始聚类中心的k＿means改进算法[J].电脑知识与技术,2019,15(2X):147-151. 被引量：3
5李雯,朱建生,单杏花.基于指数权重算法的铁路互联网售票异常用户智能识别的研究与实现[J].铁路计算机应用,2018,27(10):7-10. 被引量：2
6曾新,杨健,张鑫,陶安玲.基于K-means算法的优秀班集体评选方法[J].大理大学学报,2018,3(12):24-29. 被引量：1
7包志强,赵媛媛,赵研,胡啸天,黄琼丹.基于改进RFM模型的百度外卖客户价值分析[J].西安邮电大学学报,2019,24(1):105-110. 被引量：8
8孙印杰,张新乐,孙林.基于EK-medoids聚类和邻域距离的特征选择方法[J].计算机应用研究,2019,36(8):2279-2283. 被引量：1
9汤深伟,贾瑞玉.基于改进粒子群算法的k均值聚类算法[J].计算机工程与应用,2019,55(18):140-145. 被引量：36
10贾晓莉,吴瑞,吴思颖.并行分布式的Web访问模式双层聚类[J].计算机工程与应用,2019,55(23):216-221. 被引量：3

1赵向梅,王艳君,刘林.聚类算法及聚类融合算法研究[J].电子设计工程,2011,19(15):4-5. 被引量：5
2杨瑞,胡晓峰,周成军,王翠.低分辨率约束下的态势显示问题研究[J].计算机仿真,2012,29(7):29-33. 被引量：3
3李静,张磊,韩陈寿.基于多聚类结果融合的轨迹聚类方法[J].微电子学与计算机,2011,28(8):63-66. 被引量：1
4于本成,鲍宇,曹天杰,朱作付.面向大型数据集的聚类算法的优化与融合[J].计算机工程与设计,2014,35(5):1651-1655.
5中国银行长城云支付卡面市[J].中国信用卡,2015,0(11):94-94.
6钱帮全,焦良葆,陈瑞.基于HCE的云支付系统实现[J].信息化研究,2016,42(5):75-78. 被引量：1
7阳琳贇,王文渊.聚类融合方法综述[J].计算机应用研究,2005,22(12):8-10. 被引量：28
8张慧琳,赵云辉.浅析基于支付标记化技术的移动支付安全方案[J].信息安全与技术,2015,6(7):3-5 14. 被引量：5
9杨栋.HCE：提升资源利用率的MapReduce框架[J].程序员,2011(8):40-43.
10缪凯.HCE的校园应用研究[J].中国新技术新产品,2015(8):18-19. 被引量：2

计算机应用

2013年第8期

浏览历史

内容加载中请稍等...

基于数据集特点的增强聚类集成算法被引量：5

参考文献17

同被引文献42

引证文献5

二级引证文献24

相关作者

相关机构

相关主题

浏览历史

基于数据集特点的增强聚类集成算法 被引量：5

参考文献17

同被引文献42

引证文献5

二级引证文献24

相关作者

相关机构

相关主题

浏览历史

基于数据集特点的增强聚类集成算法被引量：5