期刊文献+

基于近邻密度改进的SVM不平衡数据集分类算法

Imbalanced dataset classification algorithm based on NDSVM
下载PDF
导出
摘要 针对不平衡数据集数据分布不均匀及边界模糊的特点,提出基于近邻密度改进的SVM(NDSVM)不平衡数据集分类算法.该算法先计算多数类内每个样本的近邻密度值,然后依据该密度值选出多数类中位于边界区域、靠近边界区域的与少数类数目相等的样本分别与少数类完成SVM初始分类,最后用所得的支持向量机和剩余的多数类样本完成初始分类器迭代优化.人工数据集和UCI数据集的实验结果表明,与WSVM、ALSMOTE-SVM和基本SVM算法相比,本文算法分类效果良好,能有效改进SVM算法在分布不均匀及边界模糊数据集上的分类性能. Aimed at the data of uneven distribution and indistinct boundary in imbalanced dataset, imbalanced dataset classification algorithm based on neighbor density support vector machine(NDSVM)is proposed. In this algorithm, neighbor density value of each sample in the majority is solved firstly. According to the density, the data which on the majority class border or close to the border is equal to the minority samples in quantity, which are selected and the minority class complete SVM initial classification. Then the resulting support vector machine and residual data in the majority class optimize the initial classifier. The simulation results of experiments on the manual and UCI dataset show that compared with WSVM, ALSMOTE-SVM and SVM, NDSVM has better classification performance, which effectively improve the classification performance of SVM algorithm on the uneven distribution and indistinct boundary in imbalanced dataset.
作者 刘悦婷 LIU Yueting(School of Media Engineering, Lanzhou University of Arts and Science, Lanzhou 730000, Chin)
出处 《延边大学学报(自然科学版)》 CAS 2018年第1期43-48,共6页 Journal of Yanbian University(Natural Science Edition)
基金 2015年甘肃省高等学校科研项目(2015B-132)
关键词 支持向量机 不平衡数据集 近邻密度 分布不均匀 边界区域 support vector machine imbalanced dataset neighbor density uneven distribution boundary
  • 相关文献

参考文献8

二级参考文献169

  • 1张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量:10
  • 2郑恩辉,李平,宋执环.不平衡数据知识挖掘:类分布对支持向量机分类的影响[J].信息与控制,2005,34(6):703-708. 被引量:17
  • 3韩慧,王路,温明,王文渊.不均衡数据集学习中基于初分类的过抽样算法[J].计算机应用,2006,26(8):1894-1897. 被引量:11
  • 4Phua C, Alahakoon D, Lee V. Minority Report in Fraud Detection: Classification of Skewed Data. ACM SIGKDD Explorations Newsletter, 2004, 6 ( 1 ) : 50 - 59.
  • 5Zheng Zhaohui, Srihari R. Optimally Combining Positive and Negative Features for Text Categorization [ EB/OL]. [ 2003-08-24 ]. http ://www. site. uottwa. ca/-nat/Workshop2003/zheng.pdf.
  • 6Ertekin S, Huang Jian, Bottou L, et al. Learning on the Border: Active Learning in Imbalanced Data Classification [ EB/OL ]. [ 2007-11-08 ]. http://www. personal. psu. edu/juh177/pubs/ CIKM2007. pdf.
  • 7Kubat M, Matwin S. Addressing the Curse of Imbalanced Training Sets: One Sided Selection// Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997: 179- 186.
  • 8Barandela R, Valdovinos R M, Sanchez J S, et al. The Imbalanced Training Sample Problem: Under or over Sampling// Proc of the Joint IAPR International Workshops on Structural, Syntactic and Statistical Pattern Recognition. Lisbon, Portugal, 2004 : 806 - 814.
  • 9Chawla N V, Hall L O, Bowyer K W, et al. Smote: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 2002, 16 : 321 - 357.
  • 10Han Hui, Wang Wenyuan, Mao Binghua. Borderline Smote: A New Over-Sampling Method in Imbalanced Data Sets Learning//Proc of the International Conference on Intelligent Computing. Hefei, China, 2005 : 878 -887.

共引文献253

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部