摘要
当前聚类集成的研究主要是围绕着集成策略的优化展开,而针对基聚类质量的度量及优化却较少研究。基于信息熵理论提出了一种基聚类的质量度量指标,并结合三支决策思想构造了面向基聚类的三支筛选方法。首先预设基聚类筛选三支决策的阈值α、β,然后计算各基聚类中类簇质量的平均值,并把其作为各基聚类的质量度量指标,最后实施三支决策。决策策略为:当某个基聚类的质量度量指标小于阈值β时,删除该基聚类;当某个基聚类的质量度量指标大于等于阈值α时,保留该基聚类;当某个基聚类的质量度量指标大于等于β小于α时,重新计算该基聚类质量,并且再次实施上述三支决策直至没有基聚类被删除或达到指定迭代次数。对比实验结果表明,基聚类三支筛选方法能够有效提升聚类集成效果。
At present,the researches of ensemble clustering mainly focus on the optimization of ensemble strategy,while the measurement and optimization of the quality of basic clustering are rarely studied.On the basis of information entropy theory,a quality measurement index of basic clustering was proposed,and a three-way screening method for basic clustering was constructed based on three-way decision.Firstly,α,βwere reset as the thresholds of three-way decision of basic clustering screening.Secondly,the average cluster quality of each basic clustering was calculated and was used as the quality measurement index of each basic clustering.Finally,the three-way decision was implemented.For one three-way screening,its decision strategy is:1)deleting the basic clustering if the quality measurement index of the basic clustering is less than the thresholdβ;2)keeping the basic clustering if the quality measurement index of the basic clustering is greater than or equals to the thresholdα;3)recalculating the quality of a basic clustering and if the quality measurement index of the basic clustering is greater thanβand less thanαor equals toβ.For the third option,the decision process continues until there is no deletion of basic clustering or reaching the times of iteration.The comparative experiments show that the three-way screening method of basic clustering can effectively improve the ensemble clustering effects.
作者
徐健锋
邹伟康
梁伟
程高洁
张远健
XU Jianfeng;ZOU Weikang;LIANG Wei;CHENG Gaojie;ZHANG Yuanjian(School of Information Engineering,Nanchang University,Nanchang Jiangxi 330031,China;School of Software,Nanchang University,Nanchang Jiangxi 330047,China;College of Electronics and Information Engineering,Tongji University,Shanghai 201804,China)
出处
《计算机应用》
CSCD
北大核心
2019年第11期3120-3126,共7页
journal of Computer Applications
基金
国家自然科学基金资助项目(61763031,61673301)
国家重点研发计划项目(213)~~
关键词
三支决策
聚类集成
基聚类
三支筛选
three-way decision
ensemble clustering
basic clustering
three-way screening