摘要
利用群体智慧原理,将多个相互独立的聚类算法的结果进行聚合,将显著提高聚类结果的准确性.基于群体智慧的簇连接聚类集成算法,首先使用群体智慧理论的独立性、分散性、多样性原则引导个体聚类结果的生成,然后提出基于连接三元组的聚类集成算法对个体聚类结果进行分组聚合,将分组聚合的结果再次进行聚合得到最终的聚类结果.该算法的优点包括:1)通过簇的分组和权重调整,避免了对基聚类生成的簇进行选择,有利于充分利用已生成簇的信息;2)采用连接三元组算法计算数据之间的相似性,可以充分挖掘数据点之间的关系.对不同数据集的实验研究表明:该算法相对传统的集成聚类算法以及群体智慧与机器学习相结合的集成聚类算法,可以进一步提高集成聚类结果的准确性.
The accuracy and stability of clustering will be obviously improved when a lot of independent clustering results for the same data set are aggregated by utilizing the principle of wisdom of crowds.In this paper,clustering ensemble algorithm with cluster connection based on wisdom of crowds(CECWOC)is proposed.Firstly,the independent clustering results are produced by the different clustering algorithms,which is guided by utilizing the independency,decentralization,diversity of wisdom of crowds.Secondly,the clustering ensemble algorithm based on connecting triple is developed to grouping aggregate the produced independent clusters,and the obtained results are aggregated again and the final cluster set is produced.The advantages of proposed algorithm are that:1)The produced clusters by base clustering is grouping aggregated and weights of clusters are adjusted so that the selection of clusters is avoided,as a result,information on the produced clusters are not ignored;2)Similarities of data are computed by using connected triple algorithm,the relations of data that their similarities are zero can be used.The experimental results at the different data sets show that the proposed algorithm can obtain the more accurate and stable results than other clustering ensemble algorithms,including the ones based on framework of wisdom of crowds.
作者
张恒山
高宇坤
陈彦萍
王忠民
Zhang Hengshan;Gao Yukun;Chen Yanping;Wang Zhongmin(School of Computer Science & Technology, Xi' an University of Posts and Telecommunications, Xi' an 710121;Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing (Xi'an University of Posts and Telecommunications), Xi' an 710121)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2018年第12期2611-2619,共9页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61373116)
陕西省科技统筹创新工程基金项目(2016KTZDGY04-01)~~
关键词
群体智慧
聚类集成
连接三元组
聚类集成选择
数据挖掘
wisdom of crowds (WOC)
clustering ensemble
connecting triple
clustering ensemble select (CES)
data mining