期刊文献+

面向不均衡数据的重采样算法 被引量:1

Resampling Algorithm for Unbalanced Data
下载PDF
导出
摘要 针对欠采样可能丢弃过多的有用信息,合成少数类的过抽样技术(Synthetic Minority Over-sampling Technique,SMOTE)可能会引入过多的噪声的问题,提出了SMOTE改进算法.该算法首先使用聚类算法将少数类分为若干个簇,随机选择簇中的若干样本合成中间样本点,再与簇芯合成新的样本点,然后将随机欠采样(Random Under-Sampling,RUS)和SMOTE改进算法结合,提出了RUCSMOTE算法.该算法首先根据当前样本不平衡比率,使用随机欠采样,再使用SMOTE改进算法对少数类进行过采样,最终得到平衡的数据集.通过理论分析可知,RUCSMOTE算法结合两种算法的优点,减少过拟合的风险,同时减少因为欠采样丢失的多数类信息.在20个KEEL不平衡数据集上的实验结果表明,对于不均衡分类,相对于另外7种重采样算法,评价指标AUC与GM普遍提高了2~7个百分点. Aiming at the problem that undersampling may discard too much useful information,and synthetic minority oversampling technique(SMOTE)may introduce too much noise,an improved SMOTE algorithm is proposed.The algorithm first uses clustering algorithm to divide a few classes into several clusters,randomly selects a number of samples in the cluster to synthesize intermediate sample points,and then synthesizes new sample points with the cluster core,and then combines Random Under Sampling(RUS)and SMOTE improved algorithm to propose RUCSMOTE algorithm.The algorithm first uses random undersampling according to the current sample imbalance ratio,and then uses the improved SMOTE algorithm to oversampling a few classes to finally obtain a balanced dataset.It can be seen from theoretical analysis that RUCSMOTE algorithm combines the advantages of the two algorithms to reduce the risk of over fitting and the loss of most types of information due to under sampling.The experimental results on 20 KEEL imbalanced datasets show that,for imbalanced classification,compared with the other 7 resampling algorithms,the evaluation indexes AUC and GM are generally improved by 2 to 7 percentage points.
作者 朱深 徐华 成金海 ZHU Shen;XU Hua;CHENG Jinhai(College of Artificial Intelligence and Computer,Jiangnan University,Wuxi 214122,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2024年第3期542-548,共7页 Journal of Chinese Computer Systems
基金 中国博士后科学基金项目(2018T110441)资助.
关键词 不平衡数据 聚类 过采样 欠采样 imbalanced data clustering oversampling under-sampling
  • 相关文献

参考文献10

二级参考文献41

共引文献135

同被引文献21

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部