摘要
数据集类别不平衡问题是分类领域的重要问题之一,每个数据集的不平衡指数都与其自身有着紧密的联系,是数据集的重要标志。面对不平衡数据集分类设计问题,提出了一种改进AdaBoost算法(enhanced AdaBoost, E-AdaBoost)。该算法将不平衡指数和不平衡数据集中较为重要的少数类分类正确率考虑到算法的迭代过程中,改进了基分类器的权重更新策略,进而提高对不平衡数据集的分类性能。基于E-AdaBoost的不平衡数据集分类设计方法可以根据样本的不平衡指数,确定基分类器的权重参数,进而提高分类器性能。利用该方法,结合多个经典分类器,在人工数据集和标准数据集上进行实验分析,并对比相关方法,结果表明,基于E-AdaBoost的不平衡数据集分类设计方法能够有效提高不平衡数据集的分类性能。
The imbalance of data sets category is one of the important problems in the classification field.The unbalanced index of each data set is closely related to itself,it is a key indicator of data sets.To deal with the classification design of unba-lanced data sets,this paper proposed an enhanced AdaBoost(E-AdaBoost)algorithm.In the process of iteration,the algorithm took into account unbalanced index,and the classification accuracy of the minority classed that was more important in unba-lanced data sets improving the weight updating strategy of the base classifier,and thus promoting the classification performance of unbalanced data sets.The classification design method of unbalanced data sets based on E-AdaBoost could determine the weight parameters of the base classifier according to the sample unbalanced index,so as to improve the performance of the classifier.With this method that was combined with multiple classical classifiers,this paper carried out experimental analysis in terms of artificial data sets and standard data sets,and compared with relevant methods.The results show that the classification design method of unbalanced data sets based on E-AdaBoost can effectively improve the classification performance of unba-lanced data sets.
作者
周玉
岳学震
孙红玉
Zhou Yu;Yue Xuezhen;Sun Hongyu(School of Electrical Engineering,North China University of Water Resources&Electric Power,Zhengzhou 450011,China)
出处
《计算机应用研究》
CSCD
北大核心
2023年第12期3566-3571,3577,共7页
Application Research of Computers
基金
国家自然科学基金资助项目(U1504622,31671580)
河南省高等学校青年骨干教师培养计划项目(2018GGJS079)。