基于实例重要性的SVM解不平衡数据分类被引量：14

Instance Importance Based SVM for Solving Imbalanced Data Classification

导出

摘要在不平衡数据分类问题中,作为目标对象的少数类往往不易识别.常见方法存在需要显式设置实例重要度、仅仅间接支持少数类的识别等缺点.由此,文中提出基于实例重要性的支持向量机——IISVM.它分为3个阶段.前两个阶段分别采用单类支持向量机和二元支持向量机,将数据按照"最重要"、"较重要",和"不重要"3个档次重新组织.阶段3首先选择最重要的数据训练初始分类器,并通过显式设置早停止条件,直接支持少数类的识别.实验表明,IISVM的平均分类性能优于目前的主流方法. In the problem of imbalanced data classification, the minority class is the classification target, but it is more difficult to be recognized than the majority class. The current popular classification algorithms have two main disadvantages： the explicit setup of instances importance degrees and the indirect support of the recognition of minority class. An instance importance based learning algorithm is proposed, namely instance importance based support vector machine （IISVM）. IISVM is composed of three phases. In the first two phases, one class SVM and binary SVM are used respectively. And the training instances are divided into three groups： the most important group, important group and unimportant group. In the last phase, the most important instances are employed to train the initial classifier, and then the explicit stopping criteria are adopted to control the recognition of minority class directly. The experimental results illustrate that the performance of IISVM is superior to other standard or advanced solutions.

作者杨扬李善平

机构地区浙江大学计算机科学与技术学院

出处《模式识别与人工智能》 EI CSCD 北大核心 2009年第6期913-918,共6页 Pattern Recognition and Artificial Intelligence

关键词不平衡数据实例重要性支持向量机重采样代价敏感学习 Imbalanced Data, Instance Importance, Support Vector Machine, Resampling, Cost Sensitive Learning

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献18

1Phua C, Alahakoon D, Lee V. Minority Report in Fraud Detection: Classification of Skewed Data. ACM SIGKDD Explorations Newsletter, 2004, 6 ( 1 ) : 50 - 59.
2Zheng Zhaohui, Srihari R. Optimally Combining Positive and Negative Features for Text Categorization [ EB/OL]. [ 2003-08-24 ]. http ://www. site. uottwa. ca/-nat/Workshop2003/zheng.pdf.
3Ertekin S, Huang Jian, Bottou L, et al. Learning on the Border: Active Learning in Imbalanced Data Classification [ EB/OL ]. [ 2007-11-08 ]. http://www. personal. psu. edu/juh177/pubs/ CIKM2007. pdf.
4Kubat M, Matwin S. Addressing the Curse of Imbalanced Training Sets: One Sided Selection// Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997: 179- 186.
5Barandela R, Valdovinos R M, Sanchez J S, et al. The Imbalanced Training Sample Problem: Under or over Sampling// Proc of the Joint IAPR International Workshops on Structural, Syntactic and Statistical Pattern Recognition. Lisbon, Portugal, 2004 : 806 - 814.
6Chawla N V, Hall L O, Bowyer K W, et al. Smote: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 2002, 16 : 321 - 357.
7Han Hui, Wang Wenyuan, Mao Binghua. Borderline Smote: A New Over-Sampling Method in Imbalanced Data Sets Learning//Proc of the International Conference on Intelligent Computing. Hefei, China, 2005 : 878 -887.
8Jo T, Japkowicz N. Class Imbalances versus Small Disjuncts. ACM SIGKDD Explorations Newsletter, 2004, 6( 1 ) : 40 -49.
9Hulse J V, Khoshgoftaar T M, Napolitano A. Experimental Perspectives on Learning from Imbalanced Data//Proc of the 24th International Conference on Machine Learning. Corvallis, USA, 2007 : 935 - 942.
10Geibel P, Wysotzki F. Pereeptron Based Learning with Example Dependent and Noisy Costs. //Proc of the International Conference on Machine Leaming. Washington, USA, 2003:218-225.

同被引文献59

1林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量：31
2蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量：21
3张翔,肖小玲,徐光祐.基于样本之间紧密度的模糊支持向量机方法[J].软件学报,2006,17(5):951-958. 被引量：84
4吴洪兴,彭宇,彭喜元.适用于不平衡样本数据处理的支持向量机方法[J].电子学报,2006,34(B12):2395-2398. 被引量：17
5Han Hui,Wang Wenyuan,Mao Bing-huan.BorderlineSMOTE:a new over-sampling method in imbalanced data sets learning[C]//Proc of International Conference on Intelligent Computing(ICIC'05),2005:878-887.
6Jason V H,Taghi K.Knowledge discovery from imbalanced and noisy data[J].Data&Knowledge Engineering,2009,68:1513-1542.
7Hart R E.The condensed nearest neighbor rule[J].IEEE Transactions on Information Theory,1968,14(3):515-516.
8Wilson D R,Martinez T R.Reduction techniques for instance-based learning algorithms[J].Machine Learning,2000,38:257-286.
9Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
10Wang Chin Heng,Lee Lam Hong,Rajkumar R,et al.A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine[J].Expert Systems with Applications,2012,39:11880-11888.

引证文献14

1王超学,张涛,马春森.面向不平衡数据集的改进型SMOTE算法[J].计算机科学与探索,2014,8(6):727-734. 被引量：25
2王超学,张涛,马春森.基于聚类权重分阶段的SVM解不平衡数据集分类[J].计算机工程与应用,2015,51(21):133-137. 被引量：9
3鞠哲,曹隽喆,顾宏.用于不平衡数据分类的模糊支持向量机算法[J].大连理工大学学报,2016,56(5):525-531. 被引量：15
4刘悦婷,金兆强,刘凯,孙志权.一种新的基于局部密度改进SVM分类算法[J].青海大学学报（自然科学版）,2018,36(2):26-32.
5刘悦婷.基于近邻密度改进的SVM不平衡数据集分类算法[J].延边大学学报（自然科学版）,2018,44(1):43-48.
6赵清华,张艺豪,马建芬,段倩倩.改进SMOTE的非平衡数据集分类算法研究[J].计算机工程与应用,2018,54(18):168-173. 被引量：27
7刘悦婷,李晓霞,李思璇,朱旭博.基于新改进的SVM不平衡数据集分类算法[J].石河子大学学报（自然科学版）,2018,36(5):637-643. 被引量：2
8刘悦婷,孙伟刚,张发菊.一种新的近邻密度SVM不平衡数据集分类算法[J].贵州大学学报（自然科学版）,2019,36(3):75-80. 被引量：2
9刘悦婷,张燕,孙伟刚.基于局部密度改进的SVM不平衡数据集分类算法[J].宁夏大学学报（自然科学版）,2019,40(3):240-245. 被引量：1
10王圆方.基于层次聚类改进SMOTE的过采样方法[J].软件,2020,41(2):201-204. 被引量：2

二级引证文献144

1李村合,姜宇,李帅.基于不等距超平面距离的模糊支持向量机[J].计算机系统应用,2020(10):185-191. 被引量：6
2姚宇,董本志,陈广胜.一种改进的朴素贝叶斯不平衡数据集分类算法[J].黑龙江大学自然科学学报,2015,32(5):681-686. 被引量：7
3曹路,王鹏.基于SMOTE采样和支持向量机的不平衡数据分类[J].五邑大学学报（自然科学版）,2015,29(4):27-31. 被引量：2
4陈弓.基于不平衡算法的恶意网络行为检测分析[J].信息技术与信息化,2016(8):121-125.
5张成刚,宋佳智,姜静清,裴志利.一种改进的降噪自编码神经网络不平衡数据分类算法[J].计算机应用研究,2017,34(5):1329-1332. 被引量：16
6吴非,吴向前,陈晓燕.改进随机森林算法在Android恶意软件检测中的应用[J].新疆大学学报（自然科学版）,2017,34(3):322-327. 被引量：3
7武森,刘露,卢丹.基于聚类欠采样的集成不均衡数据分类算法[J].工程科学学报,2017,39(8):1244-1253. 被引量：12
8李惠富,陆光,景维鹏.文本分类中基于K-Sprinkling的特征提取方法[J].计算机工程,2017,43(12):141-146. 被引量：2
9孟杰,李田,苑泽明.基于ODR-BADASYN-SVM的中小企业信用风险评估[J].金融发展研究,2018(1):24-31. 被引量：4
10祁斌,詹国华,李志华.关于自然语言交互中语音信号优化识别仿真[J].计算机仿真,2018,35(4):137-140. 被引量：5

1王和勇,樊泓坤,姚正安,李成安.不平衡数据集的分类方法研究[J].计算机应用研究,2008,25(5):1301-1303. 被引量：24
2毛权,秦敬,王庆文,周济.基于实例的CAPP策略[J].高技术通讯,1994,4(9):8-11. 被引量：1
3方明,李天太,杨军全,黄炜.基于实例的软件数据流图模块划分方法的研究[J].计算机工程与设计,1998,19(6):15-18.
4WIN XP将直接支持蓝牙[J].电子测试,2002,15(4):24-24.
5以Windows 95拨号进入Internet[J].金融科技时代,1999,0(2):30-31.
6童玲.关于计算机硬件维护关键技术的若干探讨[J].电脑编程技巧与维护,2015(12):92-93. 被引量：7
7李贻军.RFID技术及RFID标签浅析[J].印刷技术,2006(9):41-43. 被引量：5
8李晓辉,刘妍秀.基于实例推理机制(CBR)综述[J].长春大学学报,2006,16(8):68-70. 被引量：11
9卢志浩,钟智,王楠,温海标.RBQENN算法在不平衡数据分类问题中的应用[J].广西师范学院学报（自然科学版）,2015,32(1):57-62.
10潘桂颖.基于代价有限的值覆盖决策树[J].漳州师范学院学报（自然科学版）,2013,26(1):24-28.

模式识别与人工智能

2009年第6期

浏览历史

内容加载中请稍等...

基于实例重要性的SVM解不平衡数据分类被引量：14

参考文献18

同被引文献59

引证文献14

二级引证文献144

相关作者

相关机构

相关主题

浏览历史

基于实例重要性的SVM解不平衡数据分类 被引量：14

参考文献18

同被引文献59

引证文献14

二级引证文献144

相关作者

相关机构

相关主题

浏览历史

基于实例重要性的SVM解不平衡数据分类被引量：14