期刊文献+

不平衡数据分类中的数据重采样比较研究 被引量:4

Comparative Study on Data Resampling in Unbalanced Data Classification
下载PDF
导出
摘要 机器学习在异常检测、疾病诊断等许多不同领域的应用中,普遍存在类别分布不平衡的数据现象。数据重采样是解决不平衡数据分类问题最通用的方法,近年来学术界提出了合成数据采样、聚类采样以及集成采样等一系列算法。各算法所采样生成的数据集各具特性,对不同类型分类器的作用各不相同。因此,提出一种系统比较方法,基于多个不同领域的真实数据集,采用9种不同类型的数据重采样算法实现训练集类别均衡化,比较各算法对C4.5决策树、支持向量机和最近邻3种经典分类器的精确率、召回率和F值的影响。实验结果表明,不同重采样算法生成的数据集是分类器敏感的,并给出了适用于3种分类器的重采样算法。 In the application of machine learning in many different fields such as anomaly detection,disease diagnosis and so on,there is a data phenomenon of unbalanced category distribution.Data resampling is the most common method to solve the problem of unbalanced data classification.In recent years,a series of algorithms such as synthetic data sampling,cluster sampling and integrated sampling have been proposed.The data sets generated by each algorithm have their own characteristics and have different effects on different types of classifiers.Therefore,a system comparison method is proposed,and based on multiple real data sets in many different fields,9 different types of data resampling algorithms are used to realize the equalization training set categories,so as to compare the influence of each algorithm on the accuracy,recall and F value of the three classic classifiers of C4.5 decision tree,support vector machine and nearest neighbor.The experimental results indicate that the data sets generated by different resampling algorithms are sensitive to the classifier,and the resampling algorithms suitable for the three classifiers are given.
作者 衷宇清 陈文文 李昭桦 ZHONG Yu-qing;CHEN Wen-wen;LI Zhao-hua(Communication Center,Guangzhou Power Supply Co.,Ltd.,Guangzhou Guangdong 510000,China;Guangdong Electric Power Design Institute Co.,Ltd.,China Energy Engineering Group,Guangzhou Guangdong 510000,China)
出处 《通信技术》 2020年第6期1376-1384,共9页 Communications Technology
基金 广州供电局有限公司科技项目《基于SDN控制器的数据网可视化流量调度技术研究及应用》(No.GZHKJXM20170117)。
关键词 机器学习 数据分类 不平衡数据 数据重采样 machine learning data classification unbalanced data data resampling
  • 相关文献

参考文献8

二级参考文献49

共引文献104

同被引文献37

引证文献4

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部