摘要
现实中许多领域产生的数据通常具有多个类别并且是不平衡的。在多类不平衡分类中,类重叠、噪声和多个少数类等问题降低了分类器的能力,而有效解决多类不平衡问题已经成为机器学习与数据挖掘领域中重要的研究课题。根据近年来的多类不平衡分类方法的文献,从数据预处理和算法级分类方法两方面进行了分析与总结,并从优缺点和数据集等方面对所有算法进行了详细分析。在数据预处理方法中,介绍了过采样、欠采样、混合采样和特征选择方法,对使用相同数据集算法的性能进行了比较。从基分类器优化、集成学习和多类分解技术三个方面对算法级分类方法展开介绍和分析。最后对多类不平衡数据分类研究领域的未来发展方向进行总结归纳。
In reality,the data generated in many fields usually have multiple classes and are imbalanced.In multi-class imbalance classification,problems such as class overlap,noise and multiple minority classes reduce the capability of classifiers,and effective solution of multi-class imbalance problem has become an important research topic in the field of machine learning and data mining.Based on the recent literature on multi-class imbalance classification methods,this paper analyzed and summarized both data preprocessing and algorithm-level classification methods,and conducted a detailed analysis of all algorithms in terms of advantages,disadvantages and data sets.The data preprocessing methods introduced oversampling,under-sampling,hybrid sampling and feature selection methods to compare the performance of the algorithms using the same datasets.In addition,the algorithm-level classification methods described and analyzed base classifier optimization,ensemble learning and multi-class decomposition techniques.Finally,this paper summarized the future development directions of the multi-class imbalanced data classification research field.
作者
李昂
韩萌
穆栋梁
高智慧
刘淑娟
Li Ang;Han Meng;Mu Dongliang;Gao Zhihui;Liu Shujuan(School of Computer Science&Engineering,North Minzu University,Yinchuan 750021,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第12期3534-3545,共12页
Application Research of Computers
基金
国家自然科学基金资助项目(62062004)
宁夏自然科学基金资助项目(2020AAC03216,2022AAC03279)
北方民族大学研究生创新项目(YCX22191)。
关键词
分类
多类不平衡数据
数据预处理方法
算法级分类方法
classification
multi-class imbalance data
data preprocessing method
algorithm-level classification method