期刊文献+

基于Bagging集成学习的多集类不平衡学习 被引量:5

Bagging Ensemble Learning Based Multiset Class-imbalanced Learning
下载PDF
导出
摘要 类不平衡分类问题是模式识别与机器学习领域研究的热点问题之一,广泛出现在软件缺陷预测、医疗诊断、目标检测等实际应用场景中。现有的类不平衡学习算法通常关注如何通过减少多数类样本数量或增加少数类样本数量来平衡数据集,而忽视了类不平衡数据中常存在的噪声样本以及各类样本间分布重叠的问题,导致算法的分类性能有待提升。为解决上述问题,提出基于Bagging集成学习的多集类不平衡学习算法。该算法由基于Bagging的多集构建和特征提取与多集融合两个模块构成,基于Bagging的多集构建部分通过改进的重采样算法构建多个平衡训练集并去除多数类样本中的噪声样本;特征提取与多集融合部分利用线性判别分析提高样本分离度并融合多个训练集所训练的分类器的预测结果。实验结果表明,该方法具有良好的类不平衡分类性能。 Imbalanced data classification is one of the hot research problems in the field of pattern recognition and machine learning,which widely occurs in software defect prediction,medical diagnosis,object detection and other real-world applications.The existing class-imbalanced learning algorithms usually focus on how to balance the dataset by reducing majority-class samples or increasing minority-class samples,while ignoring the problems of noise samples and distribution overlap among samples from different classes in class-imbalanced dataset,which leads to the classification performance still needs to be improved.To solve the problems above,we present a multiset class-imbalanced learning algorithm based on Bagging ensemble learning,which is composed of two modules:Bagging-based multiset construction,feature extraction and multiset fusion.The Bagging-based multiset construction part constructs multiple balanced training sets and removes noise samples from the majority class through an improved resampling technique;feature extraction and multiset fusion part utilizes linear discriminant analysis to improve the separation of samples from different classes and fuses the prediction results of classifiers trained by multiple training sets.The experiment shows that the proposed method has better class-imbalanced data classification performance.
作者 肖梁 韩璐 魏鹏飞 郑鑫浩 张上 吴飞 XIAO Liang;HAN Lu;WEI Peng-fei;ZHENG Xin-hao;ZHANG Shang;WU Fei(School of Automation and AI,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Modern Posts,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处 《计算机技术与发展》 2021年第10期1-6,共6页 Computer Technology and Development
基金 国家自然科学基金(61702280)。
关键词 类不平衡学习 重采样 线性判别分析 集成学习 多集学习 class-imbalanced learning resampling linear discriminant analysis ensemble learning multiset learning
  • 相关文献

参考文献5

二级参考文献31

  • 1蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 2Chawla N V, Bowyer K W.SMOTE: Synthetic minority oversampling technique[J].Journal of Artificial Intelligence Research, 2002,16:321-357.
  • 3Tomek I.Two modifications of CNN[J].IEEE Transactions on Systems, Man and Communications, SMC-6,1976: 769-772.
  • 4Laurikkala J.Improving identification of difficult small classes by balancing class distribution[C]//Proceedings of the 8th Conference on AI in Medicine Europe: Artificial Intelligence Medicine, 2001 : 63-66.
  • 5Breiman L.Bagging predictors[J].Machine Learning, 1996,24 ( 1 ) : 123-140.
  • 6Efon B, Tibshirani R J.An introduction to the Bootstrap[M].New York: Chapman Hall, 1993 : 1-430.
  • 7Kearns M,Valiant L.Cryptographic limitation on learning Boolean formulae and finite automata[C]//Proceedings of the 21st Annual ACM Symposium on Theory of Computing.New York, NY:ACM Press, 1989:433-444.
  • 8Spackman K A.Signal detection theory:Valuable tools for evaluating inductive leaming[C]//Proceedings of the Sixth International Workshop on Machine Learning, 1989.
  • 9Brefeld U, Scheffer T.AUC maximizing support vector learning[C]// Proc of ICML Workshop on ROC Analysis in Machine Learning.Bonn: [s.n.], 2005.
  • 10Optiz D W, Shavlik J W.Actively searching for an effective neural network ensemble[J].Connection Science, 1996, 8 (3/4) : 337-353.

共引文献81

同被引文献43

引证文献5

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部