期刊文献+

基于改进SMOTE的不平衡数据挖掘方法研究 被引量:31

Research on Datamining Method for Imbalanced Dataset Based on Improved SMOTE
下载PDF
导出
摘要 少类样本合成过采样技术(SMOTE)是一种新型的过采样方法,能够有效地处理不平衡数据分类问题,但SMOTE在产生合成样本的过程中,存在一定的盲目性.因此本文提出一种改进的过采样方法一自适应SMOTE,根据样本集内部分布特性,自适应调整SMOTE方法中近邻选择策略,控制合成样本的质量.算法分析和仿真结果表明,文中提出的方法在不影响计算复杂度的前提下,有效地提高了分类算法的整体分类准确率。 Synthetic minority over-sampling technique (SMOTE) is an effective over-sampling technique and can solve the problem of learning from imbalanced dataset. However,in the process of synthetic sample generating, SMOTE is of some blindness. Therefore, a new kind of over-sampling technique-ASMOTE, is proposed. Based on the distribution of the dataset, ASMOTE adjusts the neighbor selective strategy of SMOTE in order to control the quality of the new sample. Through theoretical analysis and empirical study, we show that our method augments the classification accuracy rate effectively without increasing the computation complexity.
出处 《电子学报》 EI CAS CSCD 北大核心 2007年第B12期22-26,共5页 Acta Electronica Sinica
关键词 不平衡数据集 少类样本合成过采样技术 自适应SMOTE 合成样本 近邻选择策略 imbalanced dataset SMOTE adaptive SMOTE synthetic data neighbor selective strategy
  • 相关文献

参考文献12

  • 1Weiss GM. Mining with rarity: A unifying framework [ J ]. SIGKDD Explorations, 2004,6(1) : 7 - 19.
  • 2Chawla N, Bowyer K, Hall L, Kegelmeyer W. SMOTE: Synthetic minority over-sampling technique[ J]. Journal of Artificial Intelligence Research,2002,16(1) :321 - 357.
  • 3Kubat M,Matwin S. Addressing the curse of imbalanced training sets:one-sided selection[A] .Proc of the 14th International Conference on Machine Leaming[C]. San Francisco,CA: Morgan Kaufmann, 1997.217 - 225.
  • 4Japkowicz N, Stephen S. The class imbalance problem: a systematic study [J]. Intelligent Data Analysis Journal, 2002, 6 (5) :429 - 450.
  • 5Gustavo E, Batista P, Ronaldo C.A study of the behavior of several methods for balancing machine learning training data [J]. SIGKDD Explorations, 2004,6 ( 1 ) : 20 - 29.
  • 6Veropoulos K, Campbell C, Cristianini N. Controlling the sensitivity of support vector machines[ A]. Proceedings of the International Joint Conference on AI[ C ]. San Francisco, CA: Morgan Kaufmann, 1999.55 - 60.
  • 7T Imam,K M Ting,J Kamruzzaman. z-SVM:An SVM for improved classification of imbalanced data [ A ]. Australian Joint Conference on AI[ C]. Hobart, Australia: Springer, 2006.264 -273.
  • 8L M Manevitz,M Yousef. One-class SVMs for document classification[ J]. Journal of Machine Leaming Research, 2001,2 (1):139- 154.
  • 9Chawla N, Bowyer K, Hall L, Kegelmeyer W. SMOTEBoost: Improving prediction of the minority class in boosting[A]. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases [ C ]. Cavtat-Dubrovnik, Croatia: Springer,2003. 107- 119.
  • 10Wu G, Chang E. Class-boundary alignment for imbalanced dataset learning[ A]. Workshop on Leaming from Imbalanced Data Sets Ⅱ,ICML[C]. Washington, DC: AAAI Press,2003: 49 - 56.

同被引文献278

引证文献31

二级引证文献313

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部