期刊文献+

基于不平衡数据样本特性的新型过采样SVM分类算法 被引量:25

New over-sampling SVM classification algorithm based on unbalanced data sample characteristics
原文传递
导出
摘要 针对传统采样方式准确率与鲁棒性不够明显,欠采样容易丢失重要的样本信息,而过采样容易引入冗杂信息等问题,以UCI公共数据集中的不平衡数据集Pima-Indians为例,综合考虑数据集正负类样本的类间距离、类内距离与不平衡度之间的关系,提出一种基于样本特性的新型过采样方式.首先对原始数据集进行距离带的划分,然后提出一种改进的基于样本特性的自适应变邻域Smote算法,在每个距离带的少数类样本中进行新样本的合成,并将此方式推广到UCI数据集中其他5种不平衡数据集.最后利用SVM分类器进行实验验证的结果表明:在6类不平衡数据集中,应用新型过采样SVM算法,相比已有的采样方式,少(多)数类样本的分类准确率均有明显提高,且算法具有更强的鲁棒性. Aiming at the problem that the accuracy and robustness of the traditional sampling methods are not obvious,under-sampling is easy to lose important sample information, and oversampling is easy to introduce redundant information,the Pima-Indians dataset in the UCI common unbalanced datasets is taken as an example to consider the relationship between the distance within classes, the distance within classes and the imbalance, therefore, a new type oversampling method based on sample characteristics is presented. Firstly, the algorithm divides the original data set into some distance belts. Then an improved adaptive neighborhood neighborhood(Smote) algorithm based on sample characteristics is proposed to synthesize new samples in each class with several samples, and is extended to other five unbalanced data sets of UCI dataset. Finally, experiments are conducted using the traditional SVM classifier, and the results show that, in the six categories of unbalanced data sets, compared with the existing sampling method, the proposed algorithm improves the classification accuracy of the minority or majority class samples, and has stronger robustness.
作者 黄海松 魏建安 康佩栋 HUANG Hai-song;WEI Jian-an;KANG Pei-dong(Key Laboratory of Advanced Manufacturing Technology of Ministry of Education,Guizhou University,Guiyang 550025,China)
出处 《控制与决策》 EI CSCD 北大核心 2018年第9期1549-1558,共10页 Control and Decision
基金 贵州工业攻关重点项目(黔科合GZ字[2015]3009) 贵州省自然科学基金项目(黔科合J字[2015]2043) 贵州省重大专项项目(黔科合JZ字[2014]2001) 贵州省教育厅项目(黔教合协同创新字[2015]02) 贵州大学研究生创新基金项目(研理工2017037)
关键词 数据集不平衡 样本距离 ANBSC-Smote过采样 数据集重构 支持向量机 unbalanced datasets sample distance ANBSC-Smoteoversampling datasetsreconstuction SVM
  • 相关文献

参考文献10

二级参考文献189

共引文献222

同被引文献203

引证文献25

二级引证文献231

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部