摘要
单一的传统分类器在处理不平衡数据时,对少数类的分类存在较大误差,效果往往不够理想,为了提高少数类和整体的分类性能,提出一种融合集成思想的不平衡数据分类方法。该方法首先对多数类样本进行过采样,与少数类样本组成类平衡的数据集;其次,从数据和算法的异质性出发,集成多个基分类器,利用集成后的算法改变数据分布。实验结果表明,该方法能够有效提升分类器的AUC、G-mean和F-measure,在实验数据集中最高提升了16.7%、21.9%和20.2%。在处理不平衡数据时,该方法能够改善分类器对少数类和整体的分类性能。
When single traditional classifier is dealing with imbalanced data,and there is a large error in the classification of minority classes,and the effect is often not ideal.In order to improve the classification performance of the minority classes and the overall classification,an imbalanced integration idea is proposed.This method firstly oversamples the majority of samples and forms a balanced data set with the minority samples;secondly,starting from the heterogeneity of the data and algorithms,it integrates multiple base classifiers and uses the integrated algorithm to change the distribution of the data.The experimental results show that this method can effectively improve the AUC,G-mean and F-measure of the classifier,and the maximum increase in the experimental data set is 16.7%,21.9%and 20.2%,respectively.This method can improve the classification performance of the classifier for minority classes and the whole when dealing with imbalanced data.
作者
强冰冰
尹红
王瑞
QIANG Bing-bing;YIN Hong;WANG Rui(Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650550,China)
出处
《软件导刊》
2021年第9期206-212,共7页
Software Guide
基金
云南省人民政府发展研究项目(YNDR2017G1C06)。
关键词
不平衡数据
分类器
类边界
少数类
imbalanced data
classifier
class boundary
minority class