期刊文献+

一种新的过采样算法DB_SMOTE 被引量:12

New oversampling algorithm DB_SMOTE
下载PDF
导出
摘要 针对非平衡数据集中类分布信息不对称现象,提出一种新的过采样算法DB_SMOTE(Distance-based Synthetic Minority Over-sampling Technique),通过合成少数类新样本解决样本不足问题。算法基于样本与类中心距离,结合类聚集程度提取种子样本。根据SMOTE(Synthetic Minority Over-sampling Technique)算法思想,在种子样本上实现少数类新样本合成。根据种子样本与少数类中心距离构造新样本分布函数。基于此采样算法并在多个数据集上进行分类实验,结果表明DB SMOTE算法是可行的。 In order to solve the asymmetry of class distribution information in imbalanced data, DB_SMOTE(Distance-based Synthetic Minority Over-sampling Technique)algorithm is presented by minority new sample synthetic. According to the distance between sample and the centre of class, seed sample is gained by combining class aggregation. Based on SMOTE (Synthetic Minority Over-sampling Technique), new sample is synthesized. Based upon the distance between seed sample and the centre of minority class, new sample distribution function is formed. Classification experiment results show DB_SMOTE is feasible.
出处 《计算机工程与应用》 CSCD 2014年第6期92-95,共4页 Computer Engineering and Applications
基金 国家自然科学基金(No.61300170 No.71371012) 教育部人文社科基金(No.13YJA630098) 安徽省自然科学基金重点资助项目(No.KJ2013A040) 高校省级优秀青年人才基金重点项目(No.2013SQRL034ZD) 校青年基金(No.2013YQ31 No.2012YQ32)
关键词 非平衡数据学习 过采样 数据分类 imbalanced data learning oversampling data classification
  • 相关文献

参考文献14

  • 1He Haibo, Garcia E A.Leaming from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009,21 (9) : 1263-1284.
  • 2Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Syn- thetic Minority Over-sampling Technique[J].Journal of Artificial Intelligence Research, 2002,16 : 321-357.
  • 3Han H,Wan W Y,Mao B H.Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning[C]// LNCS 3644 : ICIC 2005,Part I, 2005 : 878-887.
  • 4He H,Bai Y, Garcia E A, et aI.ADASYN: adaptive syn- thetic sampling approach for imbalanced learning[C]//Proc of the International Joint Conference on Neural Networks, 2008 : 1322-1328.
  • 5Jo T, Japkowicz N.Class imbalances versus small dis- juncts[J].ACM SIGKDD Explorations Newsletter,2004,6 ( 1 ) : 40-49.
  • 6程险峰,李军,李雄飞.一种基于欠采样的不平衡数据分类算法[J].计算机工程,2011,37(13):147-149. 被引量:20
  • 7Chawla N V,Lazarevic A,Hall L O,et al.SMOTEBoost: improving prediction of the minority class in boosting[C]// Proc of the 7th European Conf Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, 2003 : 107-119.
  • 8Seiffert C,Kboshgoftaar T M,Hulse J V,et al.RUSBoost: improving classification performance when training data is skewed[C]//Proc of the 19th IEEE International Con- ference on Pattern Recognition, Tampa, FL, USA, 2008 : 1-4.
  • 9李雄飞,李军,董元方,屈成伟.一种新的不平衡数据学习算法PCBoost[J].计算机学报,2012,35(2):202-209. 被引量:62
  • 10于重重,田蕊,谭励,涂序彦.非平衡样本分类的集成迁移学习算法[J].电子学报,2012,40(7):1358-1363. 被引量:26

二级参考文献71

  • 1Veropoulos K., Campbell C. and Crisfianini N. Controlling the Sensitivity of Support Vector Machines[A]. Proceedings of the 16^th International Joint Conference on Artificial Intelligence (IJCAI 1999) [C]. Stockholm, Sweden: IJCAI Press, 1999:55 - 60.
  • 2R. Akbani, S. Kwek and N. Japkowicz. Applying Support Vector Machines to Imbalanced Datasets [ A ]. Proceedings of the 15th European Conference on Machine Learning (ECML 2004) [ C]. Italy: Springer Press, 2004.39 - 50.
  • 3Yuan J., Li J., and Zhang B. Learning Concepts from Large Scale Imbalanced Data Sets using Support Ouster Machines [ A].Proceedings of the 14th annul ACM International Conference on Multimedia[ C ]. Santa Barbara: ACM Press, 2006. 441 - 450.
  • 4P. Kang and S. Cho. EUS SVMs: Ensemble of Under - Sampied SVMs for Data Imbalance Problems [A]. Proceedings of the 13^th International Conference on Neural Information Processing (ICONIP 2006) [C]. Hong Kong: Springer Press, 2006: 837 - 846.
  • 5T Imam, K M Ting, J Kamruzzaman. z - SVM: An SVM for Improved Classification of Imbalanced Data [ A ]. Proceedings of the 19th Australian Joint Conference on Artifical Intelligence (AJCAI 2006) [ C]. Hobart, Australia: Springer Press, 2006. 264 - 273.
  • 6Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W. P. Smote: Synthetic Minority Over-sampling Technique[ J]. Journal of Artificial Intelligence Research. (JAIR) ,2002,16:321 - 357.
  • 7Y. Liu,A.An,X.Huang. Boosting prediction accuracy on irn- balanced datasets with SVM ensembles[ A]. Proceedings of the 10th Pacific- Asia Conference on Knowledge Discovery and Data Mining ( PAKDD 2006) [ C ]. Singapore: Springer Press, 2006:107 - 118.
  • 8J T Kwok, I W Tsang. The Pre-image Problem in Kernel Methods [J]. IEEE. Transactions on Neural Networks,2004, 15(6) : 1517- 1525.
  • 9J C Crower. Adding a Point to Vector Diagrams In Multivariate Analysis [ J]. Biometrika, 1968,55 (3) : 582 - 585.
  • 10凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35

共引文献167

同被引文献79

  • 1CHAWLA N,BOWYER K,HALL L,et al.SMOTE:Synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
  • 2LIANG N Y,HUANG G B,SARATCHANDRAN P,et al.A fast accurate online sequential learning algorithm for feedforword networks[J].IEEE Transactions on Neural Networks,2006,17(6):1411-1423.
  • 3HUANG G B,ZHOU H,DING X,et al.Extreme learning machine for regression and multiclass classification[J].IEEE Transactions on Systems,Man,and Cybernetics - Part B:Cybernetics,2012,42(2):513-529.
  • 4SMG.E-publication Download Page[EB/OL].[2014-12-06].http://www.smg.gov.mo/www/ccaa/pdf/e_pdf_download.php.
  • 5VONG C-M,IP W-F,WONG P-K,et al.Prediction minority class for suspended particulate matters level by extreme learning machine[J].Neurocomputing,2014,128:136-144.
  • 6He H, Garcia E A. Learning from imbalanced data [J]. IEEE Transactions on Knowledge and Data Engineering, 2009,21 (9) : 1263-1284.
  • 7Chan P K,Stolfo S J. Toward Scalable Learning with NomUni- form Class and Cost Distributions:A Case Study in Credit Card Fraud Detection[C]//KDD. 1998:164-168.
  • 8Kuhat M, Holte R C,Matwin S. Machine learning for the detec- tion of oil spills in satellite radar images[J]. Machine learning, 1998,30(2/3) : 195-215.
  • 9Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelli- gence research, 2002,16(1) : 321-357.
  • 10Han H, Wang W Y, Mao B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[M]//Ad- vances in intelligent computing. Springer Berlin Heidelberg, 2005:878-887.

引证文献12

二级引证文献68

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部