摘要
集体差异性被认为是集成学习中的一个关键因素.在聚类集成的研究中,生成聚类集体的方法有许多种,但就专门致力于生成高差异性聚类集体的方法研究较少.基于此,本文提出生成高差异性聚类集体的方法 CEAN和 ICEAN,在算法中通过引入人工数据来增加聚类集体的差异性.用实验比较了 CEAN 和 ICEAN 与文献中出现的常用聚类集体生成方法,实验表明 CEAN 和 ICEAN 确实能增加生成集体的差异性,从而在相似平均集体成员准确度情况下使得聚类集成的效果更好.
Ensemble diversity is considered as a key factor in ensemble learning. There are many methods for constructing clustering collection or ensemble, but a few of them focus on the production of high ensemble diversity. Two methods are proposed for generating clustering ensembles with high diversity--constructing clustering ensemble by adding noise (CEAN) and improved CEAN (ICEAN). By adding artificial data, they can obtain clustering ensembles with high diversity. Compared with other commonly used methods for generating clustering ensembles, CEAN and ICEAN increase the ensemble diversity, and thus they get better clustering integration results with the same average ensemble member accuracy.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2008年第5期682-688,共7页
Pattern Recognition and Artificial Intelligence
基金
江西省教育厅科技资助项目(No.教技字[2007]208号
GJJ08285)
关键词
聚类集成
集体差异性
人工数据
Clustering Ensemble, Ensemble Diversity, Artificial Data