期刊文献+

基于主动学习SMOTE的非均衡数据分类 被引量:23

ACTIVE LEARNING SMOTE BASED IMBALANCED DATA CLASSIFICATION
下载PDF
导出
摘要 少数类样本合成过采样技术(SMOTE)是一种典型的过采样数据预处理方法,它能够有效平衡非均衡数据,但会带来噪音等问题,影响分类精度。为解决此问题,借助主动学习支持向量机的分类性能,提出一种基于主动学习SMOTE的非均衡数据分类方法 ALSMOTE。由于主动学习支持向量机采用基于距离的主动选择最佳样本的学习策略,因此能够主动选择非均衡数据中的有价值的多数类样本,舍弃价值较小的样本,从而提高运算效率,改进SMOTE带来的问题。首先运用SMOTE方法均衡小部分样本,得到初始分类器;然后利用主动学习策略调整分类器精度。实验结果表明,该方法有效提高了非均衡数据的分类准确率。 Synthetic Minority Over-sampling Technique(SMOTE) is a typical over-sampling data preprocessing method which can effectively balance the imbalanced data.However,it brings about noise as well as other problems,so that the classification accuracy is downgraded.To solve the problem,with the help of the classification performance of active learning SVM,an imbalance data classification approach,called ALSMOTE,which is based on active learning SMOTE,is proposed.Since active learning SVM relies on distance-based active selection optimal samples learning strategies,it can actively choose from imbalanced data the valuable majority class samples by discarding valueless samples,so as to enhance operational efficiency and mitigate the problems brought about by SMOTE.First of all SMPTE approach is used to balance a small part of samples to obtain an initial classification;then active learning strategies are followed to adjust the classification accuracy.Experimental results show that the proposed method can effectively improve the imbalanced data's classification accuracy.
出处 《计算机应用与软件》 CSCD 北大核心 2012年第3期91-93,162,共4页 Computer Applications and Software
基金 国家自然科学基金项目(10771092) 辽宁省科技厅博士启动基金项目(20081079) 大连市科学技术基金项目(2010J21DW019)
关键词 主动学习 不平衡数据集 少数类样本合成过采样技术 支持向量机 Active learning Imbalanced data set SMOTE SVM
  • 相关文献

参考文献10

  • 1钱洪波,贺广南.非平衡类数据分类概述[J].计算机工程与科学,2010,32(5):85-88. 被引量:17
  • 2He Haibo,Garcia E A.Learning from Imbalanced Data[J].IEEETransactions on Knowledge and Data Engineering,2009,21(9):1263-1284.
  • 3Chawla N V,Bowyer K,Hall L,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:231-357.
  • 4Gu Qiong,Cai Zhihua,Zhu Li,et al.Data Mining on Imbalanced DataSets[C]//International Conference on Advanced Computer Theory andEngineering,2008:1020-1024.
  • 5Lewis D,Gale W.A sequential Algorithm for Training Text Classifiers[C]//Proc of the17th Annual IACM-SIGIR Conf.on Research and development in information retrieval,1994:3-12.
  • 6张健沛,徐华.支持向量机(SVM)主动学习方法研究与应用[J].计算机应用,2004,24(1):1-3. 被引量:51
  • 7Liu Xuying,Wu Jianxin,Zhou Zhihua.Undersampling for Class-Imbal-ance Learning[J].IEEE Transactions on Systems,MAN,and Cyber-netics-Part B:Cybernetics,2009,39(2):539-550.
  • 8Tan Pangning,Steinbach M,Kumar V.数据挖掘导论[M].范明,范宏建,译.北京:人民邮电出版社,2006:241-327
  • 9Fawcett T.ROC graphs:Notes and practical considerations for research-ers[R].HP Labs,Palo Alto,CA,Tech.Rep.HPL-2003-4,2003.
  • 10Guo H,Viktor H L.Learning from imbalanced data sets with boosting and data generation:The DataBoost IM approach[J].ACM SIGKDD Explorations,2004,6(1):30-39.

二级参考文献19

  • 1Weiss G M. Mining with Rarity:A Unifying Framework[J]. SIGKDD Explorations, 2004,6(1) :7-19.
  • 2Weiss G M. Learning with Rare Cases and Small Disjunets [C]//Proc of the 12th Int'l Conf on Machine Learning, 1995:558-565.
  • 3Japkowicz N, Stephen S. The Class Imbalance Problem: A Systematic Study[J]. Intelligent Data Analysis Journal, 2002,6(5) :429 450.
  • 4Chawla N V, Bowyer K W, Hall I. O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002,16(6) : 321-357.
  • 5Kubat M, Matwin S. Addressing the Curse of Imbalanced Data Sets:One Sided Sampling[C]//Proc of the 14th Int'l Conf on Machine Learning, 1997:179-186.
  • 6Chawla N, Lazarevic A, Hall L, et al. SMOTEBoost: Improving Prcdiction of the Minority Class in Boosting[C]// Proc of the 7th European Conf on Principles and Practice of Knowledge Discovery in Databases, 2003 : 107-119.
  • 7Fan W, Stofol S, Zhang J X. AdaCost: Misclassification Cost Sensitive Boosting[C]//Proc of the 16th Int'l Conf on Machine Learning, 1999: 97-105.
  • 8Joshi M V, Agarwal R C, Kumar V. Predicting Rare Classes: Can Boosting Make any Weak Learner Strong[C]//Proc of the 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining, 2002:297-306.
  • 9Zheng Z H, Srihari R. Optimally Combining Positive and Negative Features for Text Categorization[C]//Proc of the Int'l Conf on Machine Learning, 2003 : 241-245.
  • 10Raskutti A, Kowalczyk A. Extreme Rebalancing for SVMs: a SVM Study[J]. SIGKDD Explorations, 2004,6 (1): 60-69.

共引文献91

同被引文献161

  • 1汤可宗,肖绚,贾建华,徐星.基于离散式多样性评价策略的自适应粒子群优化算法[J].南京理工大学学报,2013,37(3):344-349. 被引量:12
  • 2林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量:31
  • 3蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 4Jinn-Vi Yeh, Tai-Hsi Wu, Chuan-Wei Tsao. Using data mining techniques to predict hospitalization of hemodialysis patients [J]. dsionSupport Systems, 2011, 50 (1): 439-448.
  • 5Zerina Maetic, Abdulhamit Subasi. Detection of congestive heart failure using CA. 5 decision [J]. Southeast Europe Journal of Soft Computing, 2013, 2 (2): 74-77.
  • 6MevlutTure, FusunTokatli, Imran Kurt. Using Kaplan-Meier analysis together with decision tree methods (CART, CHAID,QUEST, C4.5 and ID3) indetermining recurrence-free survi-val of breast cancer patients [J]. Expert Systems with Applica- tions, 2009, 36 (2): 2017-2026.
  • 7Chawla NV, Bowyer K, Hall L, et al. SMOTE: Synthetic minority over-sampling technique [J]. Journal of Artificial In- telligence Research, 2002, 16 (1): 321-357.
  • 8Tang Y, Zhang YQ, Chawla NV, et al. SVMs modeling for highly imbalaneed classifications [J]. IEEE Transaction on Systems, Man, and Cybemeties, Part B: Cybernetics, 2009, 39 (1).. 281-288.
  • 9Wang K J, Makond B, Chen KH, et al. A hybrid c|assifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients [J]. Applied Soft Computing, 2014, 20 (3): 15-24.
  • 10Zhong L, Wang B, Wang Z, et al. Research and application of massive data processing technology [C] //8th International Conference on Computer Science Education. IEEE, 2013.. 829-833.

引证文献23

二级引证文献141

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部