摘要
针对类别不平衡的数据分类效果差的问题,本文提出了一种基于簇内样本平均分类错误率的混合采样算法(SABER),该算法首先对少数类使用SM OTE算法增加样本数量,然后添加各类别的部分样本至平衡样本集中,并用平衡样本集训练一个初始的分类器,然后进行多轮迭代,在每一轮迭代中执行:采用K-means算法对多数类剩余的还未用于训练分类器的样本进行聚类,根据分类器对各个簇的簇内样本平均分类错误率,提取出平均分类错误率最大的前几个簇各自的代表点,将其添加至平衡样本集中,同时不放回地随机提取与平衡样本集中新增的多数类样本数量基本相同的少数类样本,并将其添加至平衡样本集中,用平衡样本集重新训练分类器.实验结果表明,SABER算法可以提高对少数类样本的分类性能以及总体的分类性能.
Aiming at the problem of poor classification performance of datasets with imbalanced class distribution,A hybrid Sampling Algorithm Based on classification Error Rate of samples in cluster(SABER)is proposed.SMOTE algorithm is used to increase the samples of minority classes.Adding partial samples of each classes to the balanced sample set,and using the balanced sample set to train an initial classifier,then multiple iterations are performed.In each iteration:K-means clustering is applied on the remaining samples of each majority class that have not been used for training classifier;the average classification error rate of samples in each cluster is calculated,and representative samples of certain clusters with the larger average classification error rate are extracted without putting back,and they are added to the balanced sample set;at the same time,extracting minority samples which amount are approximately equal to the number of majority samples added to the balanced sample set in this iteration,then the classifier is retrained with the balanced sample set.Experimental results show that the SABER algorithm can improve the classification performance of minority class and the overall classification performance.
作者
熊炫睿
陈高升
熊炼
张媛
程占伟
付明凯
XIONG Xuan-rui;CHEN Gao-sheng;XIONG Lian;ZHANG Yuan;CHENG Zhan-wei;FU Ming-kai(College of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Chongqing Institute of Engineering,Chongqing 400056,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第8期1683-1687,共5页
Journal of Chinese Computer Systems
基金
重庆市基础科学与前沿技术研究专项项目(cstc2017jcyj AX0135,cstc2020jcyj-msxm X0636)资助
重庆市教育委员会科学技术研究项目(KJQN201801908)资助
重庆邮电大学科研启动基金项目(A2015-14)资助。