一种用于非平衡数据的SVM学习算法被引量：7

SVM Learning Algorithm Used in Imbalance Data

下载PDF

导出

摘要在实际应用中的分类数据往往是非平衡数据,少数类别的数据可能有很大的分类代价。分类性能不仅要考虑分类精度,同时要考虑分类代价。该文扩展了支持向量机(SVM)学习方法,对于以高斯核为核函数时的少数类和多数类使用不同的惩罚参数C+,C-以获得高敏感度的超平面,并提出利用遗传算法对SVM的学习参数进行优化调整。给出一种新的评价函数,对分类结果的质量进行评价。实验结果证明,算法对于非平衡数据的分类有较好的效果,对少数类样本预测的准确性较高。 In practice, training data is usually imbalanced, one class is ＂rare＂ relative to the other, and misclassification cost of the rare class may be much greater than the cost of the other class. In this situation, accuracy and the misclassification cost should be considered. This paper extends the Support Vector Machine（SVM） learning method, based on the Gauss kernel, by the use of C＋（ the weight assigned to the rare class）, and C （the weight assigned to the other class）to train more sensitive hyperplane, which is optimized by generic algorithm. Meanwhile, a new sensitive quality measure function is introduced in the optimization process. Experimental results show that the optimized algorithm has competitive performance when dealing with the rare class in the imbalance training data.

作者蒋莎张晓龙

机构地区武汉科技大学计算机学院

出处《计算机工程》 CAS CSCD 北大核心 2008年第20期198-199,202,共3页 Computer Engineering

关键词支持向量机非平衡数据评价函数学习参数优化 Support Vector Machine（SVM） imbalance data measure function learning parameters optimization

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献4

1Vapnik V N. The Nature of Statical Learning Theory[M]. New York, USA: Spfinger-Verlag, 1995.
2张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量：10
3Musicant D, Kumar V, Ozgur A. Optimizing P-measure with Support Vector Machines[C]//Proceedings of the 16th International Florida Artificial Intelligence Research Society Conference. Florida, USA: AAAI Press, 2003: 356-360.
4Morik K, Brockhausen P, Joachims T. Combining Statistical Learning with a Knowledge-based Approach A Case Study in Intensive Care Monitoring[C]//Proceedings of the International Conference on Machine Learning. San Diego, CA, USA: [s. n.], 1999.

二级参考文献20

1Weiss G M. Mining with Rarity: A Unifying Framework. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, 2004,6(1)
2Guo Hongyu, Viktor Herna L. Learning from Imbalanced Data Sets with Boosting and Data Generation: The DataBoost-IM Approach. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, 2004,6(1)
3Raskutti B, Kowalczyk A. Extreme Rebalancing for SVMs: a case study. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, 2004,6 (1)
4Batista G E A P A, Prati R C, Monard M C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, 2004,6( 1 )
5Jo T,Japkowicz N. Class Imbalance versus Small Disjuncts. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, 2004, 6 (1)
6Phua C, Alahakoon D, Lee V. Minority Report in Fraud Detection:Classification of Skewed Data. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining,2004, 6(1)
7Petrushin V A,Kao A,Khan L. The 4th Intl. Workshop on Multimedia Data Mining(MDM/KDD2003), Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining,2004,6(1)
8Dolores del Castillo M,Serrano Jose lgnacio. A Multistrategy Approach for Digital Text Categorization from Imbalanced Documents. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, 2004,6 (1)
9Zheng Zhaohui,Wu Xiaoyun,Srihari Rohini. Feature Selection for Text Categorization on Imbalanced Data. Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining,2004,6(1)
10Huang K,et al. Learning Classifiers from Imbalanced Data Based on Biased Minimax Probability Machine, Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proc. of the 2004 IEEE Computer Society Conf. on ,2004,2: Ⅱ-558～Ⅱ -563

共引文献9

1廖志芳,陈宇宙,樊晓平,瞿志华.面向非平衡混合数据的改进计数最近邻分类算法[J].计算机工程与应用,2008,44(12):139-141. 被引量：2
2周舒冬,张磊,李丽霞.基于K近邻的过抽样算法在不平衡的医学资料中的应用[J].中国卫生统计,2008,25(6):566-569. 被引量：4
3周舒冬,李丽霞,郜艳晖,徐英,叶小华,张丕德.加权Fisher线性判别法在非平衡医学数据集中的应用[J].数理医药学杂志,2009,22(1):59-61. 被引量：2
4黄秀丽,王蔚.SVM在非平衡数据集中的应用[J].计算机技术与发展,2009,19(6):190-193. 被引量：3
5刘芳,李义杰.改进的种群分类蚁群算法及其应用[J].计算机系统应用,2010,19(1):144-148. 被引量：2
6翟云,杨炳儒,曲武.不平衡类数据挖掘研究综述[J].计算机科学,2010,37(10):27-32. 被引量：37
7陶新民,童智靖,刘玉,付丹丹.基于ODR和BSMOTE结合的不均衡数据SVM分类算法[J].控制与决策,2011,26(10):1535-1541. 被引量：22
8林开标,卢萍,李佳莉.基于SVM-RFE的非平衡数据特征选择算法[J].福建电脑,2012,28(9):67-70.
9吴敏,张化朋,李雷.欠抽样和DEC相结合的不平衡数据分类算法[J].计算机技术与发展,2014,24(4):110-113. 被引量：3

同被引文献60

1陈丽,陈静.基于支持向量机和k-近邻分类器的多特征融合方法[J].计算机应用,2009,29(3):833-835. 被引量：14
2王成山,王继东.基于小波包分解的电能质量扰动分类方法[J].电网技术,2004,28(15):78-82. 被引量：68
3Vapnik V. Statictical Learning Theory[M]. New York, USA: Wiley, 1998.
4Tang Yuchun. Granular Support Vector Machines Based on Granular Computing, Soft Computing and Statistical Learning[D]. Atlanta, USA: Georgia Stage University, 2006.
5Yao Y Y. On Modeling Data Mining with Granular Computing[C]// Proc. of the 25th Annual International Conference on Computer Software and Applications. Chicago, USA: [s. n.], 2001.
6Kubat M, Matwin S. Addressing the Curse of Imbalanced Training Sets: One-sided Selection[C]//Proc. of the 14th International Conference on Machine Learning. Nashville, Tennessee, USA: [s. n.], 1997.
7Japkowicz N, Stephen S. The Class Imbalance Problem: A Systematic Study[J]. Intelligent Data Analysis, 2002, 6(5): 429-449.
8Elkan C. The Foundation of Cost-sensitive Learning[C]//Proc. of IJCAI'01. Seattle, USA: [s. n.], 2001.
9Tax D M J. One-class Classification[D]. Delft, The Netherlands: Delfl University of Technology, 2001.
10BLAKE C, MERZ C. UCI repository of machine learning data bases [EB/OL]. [ 2011-03-25]. hnp://www, ics. uei. edu/- mlearn/- MLRepository. html.

引证文献7

1郭虎升,亓慧,王文剑.处理非平衡数据的粒度SVM学习算法[J].计算机工程,2010,36(2):181-183. 被引量：15
2方景龙,王万良,何伟成.用于不平衡数据分类的FE-SVDD算法[J].计算机工程,2011,37(6):157-158. 被引量：2
3张健,方宏彬,孙启林,刘明术.基于商空间理论的非平衡数据集分类算法[J].计算机应用,2012,32(1):210-212. 被引量：2
4刘进军.基于惩罚的SVM和集成学习的非平衡数据分类算法研究[J].计算机应用与软件,2014,31(1):186-190. 被引量：6
5张安安,郑萍,方琳,彭嵩松.一种基于邻域样本密度的SVDD样本剪辑方法及其应用[J].江西科学,2014,32(6):884-889. 被引量：2
6野梅娜,李艳艳,杨陈军,张瑞.非平衡数据处理方法在癫痫发作检测中的应用[J].西北大学学报（自然科学版）,2016,46(6):789-794. 被引量：2
7陈刚,王丽娟.基于高斯混合模型的非平衡数据对称翻转算法[J].信息与控制,2020,49(2):203-209. 被引量：2

二级引证文献31

1程凤伟.基于划分融合的非平衡SVM分类算法[J].山西大学学报（自然科学版）,2021,44(1):56-61.
2程险峰,李军,李雄飞.一种基于欠采样的不平衡数据分类算法[J].计算机工程,2011,37(13):147-149. 被引量：21
3赵秀宽,阳建宏,黎敏,徐金梧.一种改进的不平衡数据集分类方法[J].计算机工程,2011,37(15):122-124. 被引量：1
4张健,方宏彬,孙启林,刘明术.基于商空间理论的非平衡数据集分类算法[J].计算机应用,2012,32(1):210-212. 被引量：2
5张健,方宏彬.剪枝与欠采样相结合的不平衡数据分类方法[J].计算机应用研究,2012,29(3):847-848. 被引量：4
6于重重,商利利,谭励,涂序彦,杨扬.半监督学习在不平衡样本集分类中的应用研究[J].计算机应用研究,2013,30(4):1085-1089. 被引量：8
7吴琼,李运田,郑献卫.面向非平衡训练集分类的随机森林算法优化[J].工业控制计算机,2013,26(7):89-90. 被引量：13
8郭虎升,王文剑.基于粒度偏移因子的支持向量机学习方法[J].计算机研究与发展,2013,50(11):2315-2324. 被引量：4
9郎咸吉,王加阳.商空间合成技术[J].模式识别与人工智能,2013,26(12):1115-1120. 被引量：2
10吴琼,周维民,李运田.基于Adaboost分类算法的优化研究与应用[J].工业控制计算机,2013,26(12):90-92.

1杨静宇,魏兴国,孙怀江.一种快速SVM学习算法[J].南京理工大学学报,2003,27(5):530-535. 被引量：6
2吕宏伟.一种改进的半监督增量SVM学习算法[J].科学技术与工程,2010,10(1):238-240.
3杨钟瑾.自底向上加快神经网络学习的算法[J].湖南师范大学自然科学学报,2006,29(3):39-44.
4李向英,隋雪莲.遥感影像基础数据库的安全设计[J].信息安全与技术,2014,5(4):54-55.
5余长春.基于散列存储和数据加密的数据安全保护研究[J].信息安全与技术,2014,5(10):30-32.
6杨钟瑾.自底向上优化神经网络的方法[J].计算机工程与应用,2006,42(23):34-37.
7杨钟瑾.自顶向下优化神经网络的方法[J].广东工业大学学报,2006,23(3):95-101.
8朱学贵,王毅,昝建明.基于单纯形法的神经元PID控制器学习参数优化[J].系统仿真学报,2006,18(11):3030-3033. 被引量：6
9叶菲,罗景青,俞志富.一种改进的并行处理SVM学习算法[J].微电子学与计算机,2009,26(2):40-43. 被引量：6
10张艳,兰光华,郁生阳,杨静宇.一种快速SVM学习算法[J].计算机工程与应用,2006,42(32):36-38. 被引量：2

计算机工程

2008年第20期

浏览历史

内容加载中请稍等...

一种用于非平衡数据的SVM学习算法被引量：7

参考文献4

二级参考文献20

共引文献9

同被引文献60

引证文献7

二级引证文献31

相关作者

相关机构

相关主题

浏览历史

一种用于非平衡数据的SVM学习算法 被引量：7

参考文献4

二级参考文献20

共引文献9

同被引文献60

引证文献7

二级引证文献31

相关作者

相关机构

相关主题

浏览历史

一种用于非平衡数据的SVM学习算法被引量：7