摘要
社会发展的同时带来大量数据的产生,不平衡成为众多数据集的显著特点,如何使不平衡数据集得到更好的分类效果成为了机器学习的研究热点。基于此,对目前存在的不平衡数据集分类方法进行综述研究,从不平衡数据采样方法、基于机器学习的改进算法以及组合方法三个层面对目前存在的方法进行全面的梳理与总结,对各方面方法所解决的问题、算法思想、应用场景以及各自的优缺点进行归纳和分析,同时对不平衡数据集分类方法存在的问题和未来研究方向提出一些总结和展望。
The development of society has brought countless data,with the unbalancedness becoming a significant feature of many data sets.So it has come to be a research hotspot for machine learning on how to make those unbalanced data sets obtain better effects of classification.Based on this,this paper conducted a comprehensive research on the current unbalanced data set classification method,and made an overall interpretation and conclusion from such three aspects as the unbalanced data sampling method,the method of machine learning-based improved algorithm and the combination method.It also analyzed and took into account many factors,including the problems solved by each method,algorithm mentality,application scenarios,as well as the advantages and disadvantages of each,and delivered a summary on potential problems of the classification methods and a prospect on the future research directions.
作者
周玉
孙红玉
房倩
夏浩
Zhou Yu;Sun Hongyu;Fang Qian;Xia Hao(School of Electric Power,North China University of Water Resources&Electric Power,Zhengzhou 450045,China)
出处
《计算机应用研究》
CSCD
北大核心
2022年第6期1615-1621,共7页
Application Research of Computers
基金
河南省高等学校青年骨干教师培养计划项目(2018GGJS079)
国家自然科学基金资助项目(U1504622,31671580)。
关键词
不平衡数据集
分类
数据处理
机器学习
unbalanced data set
classification
data processing
machine learning