期刊文献+

不平衡数据集分类方法研究综述 被引量:19

Review of imbalanced data classification methods
下载PDF
导出
摘要 社会发展的同时带来大量数据的产生,不平衡成为众多数据集的显著特点,如何使不平衡数据集得到更好的分类效果成为了机器学习的研究热点。基于此,对目前存在的不平衡数据集分类方法进行综述研究,从不平衡数据采样方法、基于机器学习的改进算法以及组合方法三个层面对目前存在的方法进行全面的梳理与总结,对各方面方法所解决的问题、算法思想、应用场景以及各自的优缺点进行归纳和分析,同时对不平衡数据集分类方法存在的问题和未来研究方向提出一些总结和展望。 The development of society has brought countless data,with the unbalancedness becoming a significant feature of many data sets.So it has come to be a research hotspot for machine learning on how to make those unbalanced data sets obtain better effects of classification.Based on this,this paper conducted a comprehensive research on the current unbalanced data set classification method,and made an overall interpretation and conclusion from such three aspects as the unbalanced data sampling method,the method of machine learning-based improved algorithm and the combination method.It also analyzed and took into account many factors,including the problems solved by each method,algorithm mentality,application scenarios,as well as the advantages and disadvantages of each,and delivered a summary on potential problems of the classification methods and a prospect on the future research directions.
作者 周玉 孙红玉 房倩 夏浩 Zhou Yu;Sun Hongyu;Fang Qian;Xia Hao(School of Electric Power,North China University of Water Resources&Electric Power,Zhengzhou 450045,China)
出处 《计算机应用研究》 CSCD 北大核心 2022年第6期1615-1621,共7页 Application Research of Computers
基金 河南省高等学校青年骨干教师培养计划项目(2018GGJS079) 国家自然科学基金资助项目(U1504622,31671580)。
关键词 不平衡数据集 分类 数据处理 机器学习 unbalanced data set classification data processing machine learning
  • 相关文献

参考文献14

二级参考文献79

  • 1VAPNIK V. The nature of statistical learning theory [ M ]. Springer-Verlag, NY, 2000 : 138-167.
  • 2IMAM T, TING K M, KANMRUZZAMAN J. z-SVM: An SVM for improved classification of imbalanced data [A]. Australian Joint Conference on AI[C]. Hobart, Australia: Springer, 2006:264-273.
  • 3WU G, CHANG E. Class-boundary alignment for imbalanced dataset learning [ A ]. Workshop on learning from imbalanced data sets Ⅱ, ICML [ C ]. Washington, DC: AAAI Press, 2003:49-56.
  • 4CHAWLA N, BOWYER K, Hall L, et al. SMOTE: Synthetic minority over-sampling technique [ J ]. Journal of Artificial Intelligence Research, 2002,16( 1 ) :321-357.
  • 5KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: one-sided selection [ A ]. Proc. of the 14th International Conference on Machine Learning [ C ]. San Francisco, CA: Morgan Kaufmann 1997: 217-225.
  • 6CRISTIANINI N, KANDOLA J, ELISSEEFF A, et alJ. On kernel target alignment[ A]. Proceedings of the Neural Information Processing Systems [ C ]. Shanghai, China: The MIT Press, 2001:367-373.
  • 7VEROPOULOS K, CAMPBELL C, CRISTIANINI N. Controlling the sensitivity of support vector machines [ A ]. Proceedings of the International Joint Conference on AI [ C]. San Francisco, CA: Morgan Kaufmann, 1999:55-60.
  • 8ASUNCION A, NEWMAN D J. UCI repository of machine learning databases[ EB/OL]. Department of Information and Computer Sciences, University of California, Irvine. http ://www. ics. uci. edu/mlearn/MLRepository. html.
  • 9HanJiawei,KamberM,PeiJian.数据挖掘概念与技术[M].第3版.北京:机械工业出版社,2012.
  • 10Breiman L. Random forests [ J ]. Machine Learning, 2001,45 ( 1 ) :5-32.

共引文献88

同被引文献263

引证文献19

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部