期刊文献+

基于聚类欠采样的集成分类算法 被引量:5

Integrated Classification Algorithm Based on Clustering and Undersampling
下载PDF
导出
摘要 不平衡数据常出现在各应用领域中,传统分类器往往关注于多数类样本而导致样本分类效果不理想。针对此问题,提出一种基于聚类欠采样的集成分类算法(ClusterUndersampling-AdaCost,CU-AdaCost)。该算法通过计算样本间维度加权后的欧氏距离得出各簇的样本中心位置,根据簇心邻域范围选择出信息特征较强的多数类样本,形成新的训练集;并将训练集放在引入代价敏感调整函数的集成算法中,使得模型更加关注于少数类别。通过对6组UCI数据集进行对比实验,结果表明,该算法在欠采样过程中抽取的样本具有较强的代表性,能够有效提高模型对少数类别的分类性能。 Unbalanced data are often found in various application areas,and traditional classifiers tend to focus on the majority class of samples,which results in unsatisfactory sample classification.To address this problem,an integrated classification algorithm(ClusterUndersampling-AdaCost,CU-AdaCost)based on clustering undersampling is proposed.The algorithm derives the sample centre positions of each cluster by calculating the dimensionally weighted Euclidean distance between samples,and selects the majority class samples with strong information features according to the cluster centroid range to form a new training set.The training set is also placed in an integrated algorithm that introduces a cost-sensitive adjustment function,so as to make the model focus more on the minority class.Through comparison experiments on six UCI datasets,the results show that the algorithm has a strong representation of samples drawn in the undersampling process,which can effectively improve the classification performance of the model for minority categories.
作者 周传华 朱俊杰 徐文倩 邓佳佳 ZHOU Chuan-hua;ZHU Jun-jie;XU Wen-qian;DENG Jia-jia(School of Management Science and Engineering, Anhui University of Technology, Ma’anshan 243000, China;School of Computer Science and Technology, University of Science and Technology of China, Hefei 230000, China)
出处 《计算机与现代化》 2021年第11期72-76,共5页 Computer and Modernization
基金 国家自然科学基金资助项目(71772002,61702006) 复杂系统多学科管理与控制安徽省普通高校重点实验室资助项目(CS2020-04)。
关键词 不平衡数据 聚类 欠采样 代价敏感 unbalanced data clustering undersampling cost sensitive
  • 相关文献

参考文献11

二级参考文献60

共引文献282

同被引文献34

引证文献5

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部