摘要
基础聚类成员预处理是聚类集成算法中的一个重要研究步骤。众多研究表明,基础聚类成员集合的差异性会影响聚类集成算法性能。当前聚类集成研究围绕着生成基础聚类和优化集成策略展开,而针对基础聚类成员的差异性度量及其优化的研究尚不完善。文中基于Jaccard相似性提出一种基础聚类成员差异性度量指标,并结合三支决策思想提出了基础聚类成员差异性三支过滤方法。该方法首先设定基础聚类成员的三支决策的初始阈值α(0)和β(0),然后计算各个基础聚类成员的差异性度量指标,进而实施三支决策。其决策策略为:当基础聚类成员的差异性度量指标小于指定阈值α(0)时,删除该基础聚类成员;当基础聚类成员的差异性度量指标大于指定阈值β(0)时,保留该基础聚类成员;当基础聚类成员的差异性度量指标大于α(0)且小于β(0)时,该基础聚类成员被归入三支决策边界域等待进一步判断。当结束一轮三支决策后,算法将重新计算三支决策阈值α(1)和β(1)并对上轮三支决策边界域重新进行三支决策,直至没有基础聚类成员被归入三支决策边界域或达到指定迭代次数。对比实验表明基础差异性度量的基础聚类三支过滤方法能够有效地提升聚类集成效果。
The pre-processing of basic clustering members is an important research step in the ensemble clustering algorithm.Numerous studies have shown that the difference in the set of basic clustering members affects the performance of the ensemble clustering.The current ensemble clustering research revolves around the generation of basic clustering and the integration of basic clustering,while the differential measurement and optimization of basic clustering members are not perfect.Based on Jaccard’s similarity,this study proposes a measurement for the differential of basic clustering members and constructs a differential three-way filtering method for basic clustering members by introducing the three-way decisions idea.This method first sets the initial thresholdsα(0)andβ(0)of the three-way decisions for basic clustering members and then calculates the differential of each basic clustering member to implement the three-way decisions.Its decision strategy is:when the differential metric of the basic clustering member is less than the specified thresholdα(0),the basic clustering member will be deleted;when the differential metric of the basic clustering member is greater than the specified thresholdβ(0),the basic clustering member will be retained;and when the differential metric of the basic clustering member is greater thanα(0)and less thanβ(0),the basic clustering member will be added into the boundary domain of the three-way decisions,and boundary domains will be further judged by the three-way decisions with new thresholds.After completing a round of the three decisions,the algorithm recalculates thresholds of the three-way decisions and remakes the three-way decisions on boundary domains of the three-way decisions remained in the last round until no basic clustering member is added to boundary domains of the three-way decisions or the specified number of iterations is reached.The comparative experiment shows that the differential measurement three-way filtering method for basic clustering can effectively improve the performance of ensemble clustering.
作者
梁伟
段晓东
徐健锋
LIANG Wei;DUAN Xiao-dong;XU Jian-feng(School of Software,Nanchang University,Nanchang 330047,China;School of Software Engineering,South China University of Technology,Guangzhou 510006,China;College of Electronics and Information Engineering,Tongji University,Shanghai 201804,China;Tellhow Software Co.,LTD,Nanchang 330096,China)
出处
《计算机科学》
CSCD
北大核心
2021年第1期136-144,共9页
Computer Science
基金
国家自然科学基金项目(61763031)
江西省自然科学基金资助项目(20202BAB202018)。
关键词
基础聚类过滤
三支决策
三支优化
聚类集成
差异性度量
Basic clustering filtering
Three-way decision
Three-way optimization
Clustering ensemble
Differential measurement