摘要
针对传统重采样方法大多使用固定采样策略,无法根据模型的优化需求改变采样策略的问题,提出一种基于自适应采样的不平衡分类方法(ASIC)。该方法根据分类模型在验证集上的表现动态调整训练集上不同类别样本的采样概率,使不同类别的采样概率由当前分类模型的需求动态决定;同时,该方法对少数类给予额外的关注,在其余条件相同的情况下为少数类赋予更大的采样概率,以弥补少数类本身样本数量不足对分类模型造成的不良影响,从而提高分类模型对少数类的识别能力。实验结果表明,使用ASIC方法训练的分类模型的平均类准确率和召回率的几何平均值均比对比方法更好,且数据分布越不平衡,ASIC方法的优势越明显。
In view of the problem that traditional resampling methods,which mostly use fixed sampling strategies,cannot change the sampling strategy according to the optimization requirements of the model,this paper proposed an adaptive sampling-based imbalanced classification method(ASIC).This method dynamically adjusts the sampling probabilities of samples of different classes on the training set according to the performance of the classification model on the validation set,so that the sampling probabilities of different classes can be dynamically determined by the requirements of the current classification model.At the same time,this method pays extra attention to the minority classes and gives the minority classes a higher sampling probability under the same other conditions,so as to compensate for the negative impact due to the insufficient number of samples of the minority class and to improve the recognition ability of the model for the minority classes.Experimental results show that this method performs better than the contrasting methods in the geometric mean in terms of both the mean class accuracy and recall.The superio-rity of the ASIC approach is more obvious when the data distribution is more unbalanced.
作者
陈琼
谢家亮
CHEN Qiong;XIE Jialiang(School of Computer Science and Engineering, South China University of Technology,Guangzhou 510640, Guangdong, China)
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2022年第4期26-34,45,共10页
Journal of South China University of Technology(Natural Science Edition)
基金
国家自然科学基金资助项目(62176095)
广东省国际合作项目(2021A0505030017)
广州市重点研发计划项目(202103010005)。
关键词
不平衡分类
自适应采样
召回率
imbalanced classification
adaptive sampling
recall