期刊文献+

一种基于核SMOTE的非平衡数据集分类方法 被引量:49

A Classfication Method For Imbalance Data Set Based on Kernel SMOTE
下载PDF
导出
摘要 本文提出一种基于核SMOTE(Synthetic Minority Over-sampling Technique)的分类方法来处理支持向量机(SVM)在非平衡数据集上的分类问题.其核心思想是首先在特征空间中采用核SMOTE方法对少数类样本进行上采样,然后通过输入空间和特征空间的距离关系寻找所合成样本在输入空间的原像,最后再采用SVM对其进行训练.实验表明,核SMOTE方法所合成的样本质量高于SMOTE算法,从而有效提高SVM在非平衡数据集上的分类效果. An approach based on kernel SMOTE (Synthetic Minority Over-sampling Technique) to solve classification on imbalance data set by Support Vector Machine (SVM) is presented. The method first oversamples the minority class in feature space by kernel SMOTE algorithm, then the pre-images of the synthetic instances are found based on a distance relation between feature space and input space.Finally,these pre-images are appended to the original data set to train a SVM.Experirnents on real data sets indicate that compared with SMOTE approach, the samples constructed by the kernel SMOTE algorithm have the higher quality. As a result, the effectiveness of classification by SVM on imbalance data set is improved.
出处 《电子学报》 EI CAS CSCD 北大核心 2009年第11期2489-2495,共7页 Acta Electronica Sinica
基金 国家自然科学基金项目(No.60773177) 福建省青年人才项目(No.2008F3108)
关键词 非平衡数据集 支持向量机 输入空间 特征空间 原像 imbalance data set support vector machine input space feature space pre-image
  • 相关文献

参考文献10

  • 1Veropoulos K., Campbell C. and Crisfianini N. Controlling the Sensitivity of Support Vector Machines[A]. Proceedings of the 16^th International Joint Conference on Artificial Intelligence (IJCAI 1999) [C]. Stockholm, Sweden: IJCAI Press, 1999:55 - 60.
  • 2R. Akbani, S. Kwek and N. Japkowicz. Applying Support Vector Machines to Imbalanced Datasets [ A ]. Proceedings of the 15th European Conference on Machine Learning (ECML 2004) [ C]. Italy: Springer Press, 2004.39 - 50.
  • 3Yuan J., Li J., and Zhang B. Learning Concepts from Large Scale Imbalanced Data Sets using Support Ouster Machines [ A].Proceedings of the 14th annul ACM International Conference on Multimedia[ C ]. Santa Barbara: ACM Press, 2006. 441 - 450.
  • 4P. Kang and S. Cho. EUS SVMs: Ensemble of Under - Sampied SVMs for Data Imbalance Problems [A]. Proceedings of the 13^th International Conference on Neural Information Processing (ICONIP 2006) [C]. Hong Kong: Springer Press, 2006: 837 - 846.
  • 5李鹏,王晓龙,刘远超,王宝勋.一种基于混合策略的失衡数据集分类方法[J].电子学报,2007,35(11):2161-2165. 被引量:16
  • 6T Imam, K M Ting, J Kamruzzaman. z - SVM: An SVM for Improved Classification of Imbalanced Data [ A ]. Proceedings of the 19th Australian Joint Conference on Artifical Intelligence (AJCAI 2006) [ C]. Hobart, Australia: Springer Press, 2006. 264 - 273.
  • 7Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W. P. Smote: Synthetic Minority Over-sampling Technique[ J]. Journal of Artificial Intelligence Research. (JAIR) ,2002,16:321 - 357.
  • 8Y. Liu,A.An,X.Huang. Boosting prediction accuracy on irn- balanced datasets with SVM ensembles[ A]. Proceedings of the 10th Pacific- Asia Conference on Knowledge Discovery and Data Mining ( PAKDD 2006) [ C ]. Singapore: Springer Press, 2006:107 - 118.
  • 9J T Kwok, I W Tsang. The Pre-image Problem in Kernel Methods [J]. IEEE. Transactions on Neural Networks,2004, 15(6) : 1517- 1525.
  • 10J C Crower. Adding a Point to Vector Diagrams In Multivariate Analysis [ J]. Biometrika, 1968,55 (3) : 582 - 585.

二级参考文献14

  • 1刘涵,郭勇,郑岗,刘丁.基于最小二乘支持向量机的图像边缘检测研究[J].电子学报,2006,34(7):1275-1279. 被引量:17
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:388
  • 3方景龙,陈铄,潘志庚,梁荣华.复杂分类问题支持向量机的简化[J].电子学报,2007,35(5):858-861. 被引量:10
  • 4Chawla N V,et al. Editorial: special issue on learning flom irabalanced data sets [ J ]. ACM SIGKDD Explorations, 2004, 6 (1):1-6.
  • 5Batista G,et al.A study of the behavior of several methods for balancing machine learning[ J] .ACM SIGKDD Explorations, 2004,6(1):20-29.
  • 6Estabrooks A, et al. A multiple resampling method for learning from imbalanced data sets [ J ]. Computational Intelligence, 2004,20(1) : 18-36.
  • 7Japkowicz N, et al. The class imbalance problem: a systematic study[ J]. Intelligent Data Analysis,2002,6(5) : 429-450.
  • 8Japkowicz N, et al. Learning from imbalanced data sets: a comparison of various strategies [ A ]. Proceedings of the AAAI' 2000 Workshop on Imbalanced Data Sets [ C ]. CA: AAAI Press,2000.10-15.
  • 9Provost F, et al. Machine learning from imbalanced data sets [A]. In Proceedings of the AAAI' 2000 Workshop on Imbalanced Data Sets[C]. CA:AAAI Press,2000. 101-103.
  • 10Visa S, et al. The effect of imbalanced data class distribution on fuzzy classifiers-experimental study[ A]. In Proceedings of the FUZZ-IEEE Conference[ C]. USA: IEEE Press,2005.22-26.

共引文献15

同被引文献483

引证文献49

二级引证文献477

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部