期刊文献+

基于实例重要性的SVM解不平衡数据分类 被引量:14

Instance Importance Based SVM for Solving Imbalanced Data Classification
原文传递
导出
摘要 在不平衡数据分类问题中,作为目标对象的少数类往往不易识别.常见方法存在需要显式设置实例重要度、仅仅间接支持少数类的识别等缺点.由此,文中提出基于实例重要性的支持向量机——IISVM.它分为3个阶段.前两个阶段分别采用单类支持向量机和二元支持向量机,将数据按照"最重要"、"较重要",和"不重要"3个档次重新组织.阶段3首先选择最重要的数据训练初始分类器,并通过显式设置早停止条件,直接支持少数类的识别.实验表明,IISVM的平均分类性能优于目前的主流方法. In the problem of imbalanced data classification, the minority class is the classification target, but it is more difficult to be recognized than the majority class. The current popular classification algorithms have two main disadvantages: the explicit setup of instances importance degrees and the indirect support of the recognition of minority class. An instance importance based learning algorithm is proposed, namely instance importance based support vector machine (IISVM). IISVM is composed of three phases. In the first two phases, one class SVM and binary SVM are used respectively. And the training instances are divided into three groups: the most important group, important group and unimportant group. In the last phase, the most important instances are employed to train the initial classifier, and then the explicit stopping criteria are adopted to control the recognition of minority class directly. The experimental results illustrate that the performance of IISVM is superior to other standard or advanced solutions.
作者 杨扬 李善平
出处 《模式识别与人工智能》 EI CSCD 北大核心 2009年第6期913-918,共6页 Pattern Recognition and Artificial Intelligence
关键词 不平衡数据 实例重要性 支持向量机 重采样 代价敏感学习 Imbalanced Data, Instance Importance, Support Vector Machine, Resampling, Cost Sensitive Learning
  • 相关文献

参考文献18

  • 1Phua C, Alahakoon D, Lee V. Minority Report in Fraud Detection: Classification of Skewed Data. ACM SIGKDD Explorations Newsletter, 2004, 6 ( 1 ) : 50 - 59.
  • 2Zheng Zhaohui, Srihari R. Optimally Combining Positive and Negative Features for Text Categorization [ EB/OL]. [ 2003-08-24 ]. http ://www. site. uottwa. ca/-nat/Workshop2003/zheng.pdf.
  • 3Ertekin S, Huang Jian, Bottou L, et al. Learning on the Border: Active Learning in Imbalanced Data Classification [ EB/OL ]. [ 2007-11-08 ]. http://www. personal. psu. edu/juh177/pubs/ CIKM2007. pdf.
  • 4Kubat M, Matwin S. Addressing the Curse of Imbalanced Training Sets: One Sided Selection// Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997: 179- 186.
  • 5Barandela R, Valdovinos R M, Sanchez J S, et al. The Imbalanced Training Sample Problem: Under or over Sampling// Proc of the Joint IAPR International Workshops on Structural, Syntactic and Statistical Pattern Recognition. Lisbon, Portugal, 2004 : 806 - 814.
  • 6Chawla N V, Hall L O, Bowyer K W, et al. Smote: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 2002, 16 : 321 - 357.
  • 7Han Hui, Wang Wenyuan, Mao Binghua. Borderline Smote: A New Over-Sampling Method in Imbalanced Data Sets Learning//Proc of the International Conference on Intelligent Computing. Hefei, China, 2005 : 878 -887.
  • 8Jo T, Japkowicz N. Class Imbalances versus Small Disjuncts. ACM SIGKDD Explorations Newsletter, 2004, 6( 1 ) : 40 -49.
  • 9Hulse J V, Khoshgoftaar T M, Napolitano A. Experimental Perspectives on Learning from Imbalanced Data//Proc of the 24th International Conference on Machine Learning. Corvallis, USA, 2007 : 935 - 942.
  • 10Geibel P, Wysotzki F. Pereeptron Based Learning with Example Dependent and Noisy Costs. //Proc of the International Conference on Machine Leaming. Washington, USA, 2003:218-225.

同被引文献59

  • 1林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量:31
  • 2蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 3张翔,肖小玲,徐光祐.基于样本之间紧密度的模糊支持向量机方法[J].软件学报,2006,17(5):951-958. 被引量:84
  • 4吴洪兴,彭宇,彭喜元.适用于不平衡样本数据处理的支持向量机方法[J].电子学报,2006,34(B12):2395-2398. 被引量:17
  • 5Han Hui,Wang Wenyuan,Mao Bing-huan.BorderlineSMOTE:a new over-sampling method in imbalanced data sets learning[C]//Proc of International Conference on Intelligent Computing(ICIC'05),2005:878-887.
  • 6Jason V H,Taghi K.Knowledge discovery from imbalanced and noisy data[J].Data&Knowledge Engineering,2009,68:1513-1542.
  • 7Hart R E.The condensed nearest neighbor rule[J].IEEE Transactions on Information Theory,1968,14(3):515-516.
  • 8Wilson D R,Martinez T R.Reduction techniques for instance-based learning algorithms[J].Machine Learning,2000,38:257-286.
  • 9Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
  • 10Wang Chin Heng,Lee Lam Hong,Rajkumar R,et al.A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine[J].Expert Systems with Applications,2012,39:11880-11888.

引证文献14

二级引证文献144

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部