摘要
针对现有的集成聚类算法通常默认使用K-means算法作为基聚类生成器,虽能确保聚类成员的多样性,却忽视了差的基聚类可能会对最终聚类结果造成极大干扰的问题,提出一种基于聚类质量的两阶段集成算法.鉴于K-means算法运行高效但聚类质量较粗糙,提出首先在生成阶段采用K-means算法生成基聚类成员,然后通过群体一致性度量筛选出兼具高质量和强多样性的聚类成员,形成候选集成;其次,进一步在集成阶段应用信息熵知识构建基聚类加权的共协矩阵;最后应用一致函数得到最终聚类结果.采用3个指标在10个真实数据集上进行对比实验,实验结果表明,该算法在有效提升聚类结果准确度的同时,能保持较好的鲁棒性.
Aiming at the problem that existing ensemble clustering algorithms usually used K-means algorithm as the base clustering generator,although it could ensure the diversity of clustering membe rs,it ignored that poor base clusterings might cause terrible disturbance to the final clustering result,we proposed a two stage ensemble algorithm b ased on clustering quality.Considering that K-means algorithm ran efficiently,but the clustering quality was relatively rough,firstly,we proposed to use K-means algorithm to g enerate base clustering members in the generation stage,and then selected clustering members with both high quality and strong diversity through group aggrement measure to form candidate ensemble.Secondly,the information entropy knowledge was futher applied to construct the weighted-clustering co-association matrix in the ensemble stage.Finall y,the final clustering result was obtained by using consensus function.Three indexes were used for comparative experiments on ten real datasets,and the experimantal results show that the algo rithm can effectively improve the accuracy of clustering results while maintaining good robustness.
作者
闫晨
杨有龙
刘原园
YAN Chen;YANG Youlong;LIU Yuanyuan(School of Mathematics and Statistics,Xidian University,Xi’an 710126,China)
出处
《吉林大学学报(理学版)》
CAS
北大核心
2023年第4期899-908,共10页
Journal of Jilin University:Science Edition
基金
陕西省自然科学基础研究计划项目(批准号:2021JM-133).
关键词
集成聚类
聚类质量
群体一致性
信息熵
一致函数
ensemble clustering
clustering quality
group aggrement
information entropy
consensus function