期刊文献+

基于支持向量机混合采样的不平衡数据分类方法 被引量:12

Classification of Unbalanced Data Based on SVM Mixed Sampling
原文传递
导出
摘要 利用传统支持向量机(SVM)对不平衡数据进行分类时,由于真实的少数类支持向量样本过少且难以被识别,造成了分类时效果不是很理想.针对这一问题,提出了一种基于支持向量机混合采样的不平衡数据分类方法(BSMS).该方法首先对经过支持向量机分类的原始不平衡数据按照所处位置的不同划分为支持向量区(SV),多数类非支持向量区(MNSV)以及少数类非支持向量区(FNSV)三个区域,并对MNSV区和FNSV区的样本做去噪处理;然后对SV区分类错误和部分分类正确且靠近决策边界的少数类样本重复进行过采样处理,直到找到测试结果最优的训练数据集;最后有选择的随机删除MNSV区的部分样本.实验结果表明:方法优于其他采样方法. When the traditional support vector machine(SVM)is used to classify unbalanced data,the actual minority support vector samples are too small and difficult to be identified,resulting in less than ideal classification results.To solve this problem,an unbalanced data classification method(BSMS)based on mixed sampling of support vector machines is proposed.This method first divides the original unbalanced data classified by SVM into three regions:the support vector region(SV),the majority non-support vector region(MNSV)and the minority non-support vector region(FNSV)according to their location.Then,the SV region classification error and the partial classification correct and the few class samples near the decision boundary are repeatedly oversampled until the best training data set is found.Finally,there is a selection of random deletion of some samples of the MNSV area.The experimental results show that this method is superior to other sampling methods.
作者 姜飞 杨明 刘雨欣 JIANG Fei;YANG Ming;LIU Yu-xin(School of Science,North University of China,Taiyuan 030051,China)
机构地区 中北大学理学院
出处 《数学的实践与认识》 2021年第1期88-96,共9页 Mathematics in Practice and Theory
基金 国家自然科学基金(61971381) 山西省自然科学基金(201801D121158)。
关键词 不平衡数据 支持向量机 过采样 欠采样 unbalanced data SVM oversampling undersampling
  • 相关文献

参考文献2

二级参考文献12

  • 1郑恩辉,李平,宋执环.不平衡数据知识挖掘:类分布对支持向量机分类的影响[J].信息与控制,2005,34(6):703-708. 被引量:17
  • 2Vapnik V N. The Nature of Statistical Learning Theory. New York, USA: Springer-Verlag, 1995
  • 3Japkowicz N, Stephen S. The Class Imbalanced Problem: A Systematic Study. Intelligent Data Analysis, 2002, 6(5): 429- 449
  • 4Chawla N V, Bowyer K W, Hall L O, etal. Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 2002, 16(3):321-357
  • 5Kubat M, Matwin S. Addressing the Curse of Imbalaneed Datasets: One-Sided Sampling//Proe of the 14th International Conference on Machine Learning. Nashville, USA, 1997, 178-186
  • 6Rehan A, Stephen K, Nathalie J. Applying Support Vector Machines to Imbalaneed Datasets // Proe of the 15th European Conference on Machines Learning. Pisa, Italy, 2004:39-50
  • 7Barandela R, Valdovinos R M, Sanchez J S, et al. The Imbalanced Training Sample Problem: Under or over Sampling? //Proc of the Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition. Lisbon, Portugal, 2004: 806-814.
  • 8I.in Y, Lee Y, Wahba G. Support Vector Machines for Classification in Nonstandard Situations. Machine Learning, 2002, 46 (1/2/3) : 191-202
  • 9Barandela R, Sanchez J S, Garcia V, et al. Strategies for Learning in Class Imbalance Problems. Pattern Recognition, 2003, 36(3):849-851
  • 10Tao Qing, Wu Gaowei, Wang Feiyue, et al. Posterior Probability Support Vector Machines for Unbalanced Data. IEEE Transon Neural Networks, 2005, 16(6):1561-1573

共引文献24

同被引文献133

引证文献12

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部