摘要
非平衡数据分析是数据领域的重要问题之一,其类间分布的巨大差异给聚类方法带来严峻挑战.围绕非平衡数据聚类问题,分析了非平衡数据对模糊聚类方法的影响,提出了基于密度感知的模糊聚类方法.方法将数据分布密度特征嵌入模糊聚类初始化过程中,用于定位初始聚类中心点,避免了少数类中心点位置的消失,在此基础上进一步设计了基于密度的模糊聚类优化更新方法.经数据集分析验证,本研究方法能够有效解决非平衡数据分类中少数类消失问题,并且在聚类算法性能上比传统方法有明显提高.
Imbalanced data analysis is a key part in biomedical areas but poses a computational challenge for clustering methods due to the huge differences in the distribution between categories. This paper dis-cusses the effects of imbalanced datasets on fuzzy clustering method based on imbalanced data clustering, and proposes a data-density-aware fuzzy clustering method to solve this problem .Specifically, a dataset is segmented into different areas with similar local density, and then a novel fuzzy clustering algorithm is im- plemented based on the initial partition ? As a result, the initial clustering center point can be located and the disappearance of the minority class central point can be avoided. An updated method is further opti-mized based on data-density-aware fuzzy clustering, which is based on the above mentioned initial density method. The experimental results show that our method can better deal with the disappearance of the minor-ity class in imbalanced datasets classification and compared with the traditional FCM, the clustering algo-rithm performance of the new FCM is obviously enhanced.
作者
王进
游磊
黎忠文
苗放
WANG Jin;YOU Lei;LI Zhongwen;MIAO Fang(School of Information Science and Engineering, Chengdu University, Chengdu 610106, China;Institute of Big Data, Chengdu University, Chengdu 610106, China)
出处
《成都大学学报(自然科学版)》
2017年第4期373-376,共4页
Journal of Chengdu University(Natural Science Edition)
基金
四川省教育厅自然科学基金(17ZA0082)资助项目