期刊文献+

改进边界分类的Borderline-SMOTE过采样方法

Improved Borderline-SMOTE oversampling method for boundary classification
下载PDF
导出
摘要 针对不平衡数据中类重叠区域易造成分类错误的问题,提出一种引入合成因子改进边界分类的Borderline-SMOTE过采样方法(IBSM).首先根据少数类样本近邻分布情况找出处于边界的少数类样本,然后计算边界样本对应的合成因子,并根据其取值更新该样本需生成的样本数,最后在近邻中根据合成因子挑选距离最近的top-Z少数类样本进行新样本生成.将提出的方法与八种采样方法在KNN和SVM两种分类器、10个KEEL不平衡数据集上进行对比实验,结果表明,提出的方法在大部分数据集上的F1,G-mean,AUC(Area under Curve)均获得最优值,且F1与AUC的Friedman排名最优,证明所提方法和其余采样方法相比,在处理不平衡数据中的边界样本分类问题时有更好的表现,通过合成因子设定一定的约束条件与分配策略,可以为同类研究提供思路. An improved Borderline-SMOTE method(IBSM)is developed to solve the problem of class overlapping region in imbalanced data,using synthesis factor to augment the boundary classification.Firstly,the minority samples that are at the boundary are identified according to the distribution of the samples′nearest neighbors.Then,the synthesis factor corresponding to the boundary samples is calculated,and the number of samples to be generated is updated according to its value.Finally,the top-Z minority samples are selected among the nearest neighbors to generate new samples according to the synthesis factor.The proposed method is compared with eight sampling methods by experiments using KNN and SVM classifiers on 10 KEEL imbalanced datasets.Experimental results show that the proposed method performs better than the others in handling the problem of boundary samples classification in imbalanced data.It obtains optimal values of F1,G-mean,AUC(Area under Curve)and the Friedman rankings on most datasets.This paper provides references for similar studies by using synthesis factor to set the constraints and allocation strategies.
作者 马贺 宋媚 祝义 Ma He;Song Mei;Zhu Yi(School of Computer Science and Technology,Jiangsu Normal University,Xuzhou,221116,China;Management Science and Technology Center,Jiangsu Normal University,Xuzhou,221116,China)
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2023年第6期1003-1012,共10页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(71503108,62077029) CCF-华为创新研究计划(CCF-HuaweiFM202209) 江苏师范大学科研与实践创新项目(2022XKT1540)
关键词 不平衡数据 边界样本 类重叠 Borderline-SMOTE 过采样 imbalance data boundary sample class overlap Borderline-SMOTE oversampling
  • 相关文献

参考文献6

二级参考文献35

共引文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部