摘要
已有的聚类集成方法大多通过对基聚类加权来降低低质量基聚类对聚类集成结果的影响,从而提升聚类集成算法性能.然而,这些工作往往对数据的所有特征等权重考虑,忽略了特征之间的相关性与差异性,无法真实地反映样本间的相似性.针对上述问题,本文提出了一种度量学习引导的加权聚类集成算法,将对马氏距离度量学习与基聚类权重学习相融合,从而提升聚类集成算法性能.算法根据上轮求得的基聚类权重进行聚类集成,基于集成结果构建的成对约束学习马氏距离度量,将划分为同簇的样本拉近,划分为不同簇的样本推远,并在新投影空间中为基聚类学习权重.在多个公开数据集上的实验结果表明,提出的算法优于现有的代表性聚类集成算法.
Most existing clustering ensemble methods weight the base clustering to reduce the influence of low-quality base clustering on the clustering ensemble results,thus improving the performance of the clustering ensemble algorithm.However,these works tend to give equal weight to all data features,ignoring the correlation and differences between features,and failing to accurately reflect the sim-ilarity between samples.Based on the abovementioned issues,this paper proposes a metric learning guided weighted clustering ensem-ble,which integrates the Mahalanobis distance metric learning with the base clustering weights learning to improve the clustering en-semble algorithm's performance.The algorithm performs ensemble clustering based on the base clustering weights obtained in the pre-vious round,and learns the Mahalanobis distance metric based on the pairwise constraints constructed from the clustering ensemble re-sults,bringing samples divided into the same clusters closer together and pushing samples from different clusters farther apart,and learning weights for the base clustering in the new projection space.The proposed algorithm outperforms existing representative cluste-ring ensemble algorithms in experiments on several publicly available datasets.
作者
吴建国
魏巍
郭鑫垚
闫京
WU Jian-guo;WEI Wei;GUO Xin-yao;YAN Jing(College of Computer Science and Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computer Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2023年第8期1607-1615,共9页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(62276160,61976184,61772323)资助
山西省自然科学基金项目(202203021211291)资助。
关键词
聚类
聚类集成
基聚类加权
马氏距离
clustering
clustering ensemble
weighting base clustering
Mahalanobis distance