摘要
针对类簇交叉且分布不均衡的复杂数据,依据可信粒度准则,提出一种结合区间二型模糊粗糙C均值(IT2FRCM)聚类与混合度量的两阶段信息粒化算法。在第一阶段,利用IT2FRCM算法对原始数据进行聚类分析,得到初始的信息粒。在第二阶段,综合考虑数据空间分布、样本规模及粒子性质等因素,采用混合度量方法设计均衡证据合理性和语义独特性的粒化函数,并基于可信粒度准则优化由覆盖度和独特性组成的复合函数,求解最佳粒子边界。在人工数据集和UCI数据集上的实验结果表明,该算法能够有效提高不平衡数据的信息粒化质量和粒子代表性,在归类正确数、粒子特性等指标上均取得了理想表现。
To address the unevenly distributed complex data with crossed clusters,this paper proposes a two-phase information granulation algorithm based on the trusted granularity criterion,which combines Interval Type-2 Fuzzy C-Means(IT2FCM)clustering and hybrid metrics.In the first phase,the IT2FCM algorithm is used to cluster the raw data to get the initial information granule.In the second phase,considering the spatial distribution of data,sample size and granule properties,a granulation function is designed to balance the rationality of evidence and semantic uniqueness by using the mixed metric method,and the composite function composed of coverage and uniqueness is optimized based on the credible granularity criterion to solve the optimal granule boundary.The experimental results on artificial data sets and UCI data sets show that the proposed algorithm can effectively improve the information granulation quality and granule representativeness of unbalanced data,and achieve ideal performance in the correct number of classification,granule characteristics and other indicators.
作者
邵丽洁
马福民
SHAO Lijie;MA Fumin(College of Information Engineering,Nanjing University of Finance and Economics,Nanjing 210023,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2021年第6期88-97,共10页
Computer Engineering
基金
国家自然科学基金(61973151)
江苏省自然科学基金(BK20191406)
江苏省高校自然科学研究重大项目(17KJA120001)。
关键词
信息粒化
可信粒度准则
聚类
密度
混合度量
information granularity
credible granularity criterion
clustering
density
mixed metrics