期刊文献+

一种新的基于K-means改进SMOTE算法在不平衡数据集分类中的应用 被引量:12

A New Kind of Improving SOMTE Algorithm Based on K-means in Imbalanced Datasets
原文传递
导出
摘要 在实际应用中,经常遇到数据分类集合中某一类的样本数量明显少于其他类的样本数量的数据不平衡问题.在二分类数据集中,一般称样本数目多的一类数据集合为正类,样本数目少的一类数据集合为负类.为了提高算法在不平衡数据集下的分类性能,提出了首先利用K-means找出负类中心点,再根据SMOTE基本原理,得出新的数据集.通过对比新数据集和原不平衡数据集在不同算法中的分类应用,结果表明本文改进算法的分类效果得到明显提升,最后用两两配对T检验验证算法的有效性. In practice,we always meet the number of some datasets significantly less than the others,in two-class datasets,we named the more as positive class,the less as negative class.In the case of unbalanced datasets,classification isn't ideal,in order to improve the algorithm under the unbalanced datasets.Firstly we put forward by K-means to find the center of the negative class,coupled with SMOTE,get a new dataset.By comparing the new dataset and unbalanced datasets,the results show that classification is improved.
出处 《数学的实践与认识》 北大核心 2015年第19期198-206,共9页 Mathematics in Practice and Theory
基金 国家自然科学基金(11401115) 广东省科技创新项目(13KJ0396) 广东省科技计划项(2013B051000075)
关键词 不平衡数据 SMOTE K-MEANS 负类中心 配对T检验 unbalanced data SMOTE K-means negative center paired T test
  • 相关文献

参考文献14

  • 1Paolo S. A multi-objective optimization approach for class imbalance learning[J]. Pattern Recogni- tion, 2011, 44(8): 801-1810.
  • 2郝秀兰,陶晓鹏,徐和祥,胡运发.kNN文本分类器类偏斜问题的一种处理对策[J].计算机研究与发展,2009,46(1):52-61. 被引量:33
  • 3Japkowicz N, Stephen S. The class imbalance problem: asystematic study[J]. Intelligent Data Anal- ysis Journal, 2002, 6(5): 429-450.
  • 4Weiss G M. Ming with Rarity: A Unifying Framework[J]. SIGKDD Explorations, 2004, 6(1): 7-19.
  • 5Gustavo E, Batista P, Ronaldo C. A study of the behavior of several methods for balancing machine learning training data[J]. Sigkdd Explorations, 2004, 6(1): 20-29.
  • 6. Chawla N V, Bowyer K W, Hall L O. SMOTE: synthetie minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 6(16): 321-357.
  • 7Methan M, Agrawal R, Rissanen J. SLI.Q: A fast scalable classifier for data mining[J]. Lecture Notes in Computer Sci.Proc.of the 5th Int. conf.on Extending Database Tech, 1996: 18-33.
  • 8Han H, Wang W Y, Mao B H. Borderline-SMOTE:Anew over-sampling method in imbalanced data sets learning[C]//Proc of International Conference on Intelligent Computing(ICIC'05), 2005: 878-887.
  • 9薛薇.非平衡数据集的改进SMOTE再抽样算法[J].统计研究,2012,29(6):95-98. 被引量:21
  • 10王和勇,姚正安.SMOTE和Biased-SVM相结合的不平衡数据分类方法[D].计算机科学,2011:174-175.

二级参考文献69

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:383
  • 2Japkowicz N. Learning from imbalanced data sets: A comparison of various strategies, WS-00-05 [R]. Menlo Park, CA: AAAI Press, 2000
  • 3Chawla N V, Japkowicz N, Kotcz A. Editorial: Special issue on learning from imbalaneed data sets [J]. Sigkdd Explorations Newsletters, 2004, 6( 1 ) : 1-6
  • 4Weiss Gary M. Mining with rarity: A unifying frameworks [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 7-19
  • 5Maloof M A. Learning when data sets are imbalanced and when costs are unequal and unknown [OL]. [2008-01-06]. http://www. site. uottawa. ca/-nat/workshop2003/workshop 2003. html
  • 6Chawla N V, Hall L O, Bowyer K W, et al. SMOTE: Synthetic minority oversampling technique [J]. Journal of Artificial Intelligence Research, 2002, 16 : 321-357
  • 7Jo Taeho, Japkowicz Nathalie. Class imbalances versus small disjunets [J]. SIGKDD Explorations Newsletters, 2004, 6 (1): 40-49
  • 8Batista E A P A, Prati R C, Monard M C. A study of the behavior of several methods for halaneing machine learning training data [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 20-29
  • 9Guo Hongyu, Viktor Herna L. Learning from imbalanced data sets with boosting and data generation: The DataBoostIM approach [J]. SIGKDD Explorations Newsletters, 2004, 6(1): 30-39
  • 10Chawla N V, Lazarevic A, Hall L O, et al. Smoteboost: Improving prediction of the minority class in boosting [C] // Proc of the Seventh European Conf on Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer, 2003:107-119

共引文献80

同被引文献95

引证文献12

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部