期刊文献+

基于分层重组的不平衡数据采样方法研究 被引量:3

Research on Imbalanced Data Sampling Method Based on Stratification and Recombination
下载PDF
导出
摘要 为了解决机器学习中不平衡数据难以用于主流数据分类器的问题,面向多种类不平衡数据,提出一种基于超平面排序、分层抽样、多类样本重组的数据采样方法,以得到可用于机器学习的分类平衡数据集。首先,求得不同种类样本的最大公共抽样数,以此确定每类样本的抽样份数;再根据数据到分类超平面的距离,对每类样本的数据进行重新排序,按照等间距对每类样本进行分层采样,确保各个样本的采样总数为最大公共抽样数倍数,构成样本内部的基数样本。最后,排列组合构造平衡数据集合。经过数据分类算法的训练和测试,结果表明:采样方法不仅实现多种类样本之间的数据平衡,也保持了样本的原有数据分布特征,提高了机器学习下游算法的精度。 In order to solve the problem of imbalanced data in machine learning which is difficult to be used in mainstream data classifiers,a data sampling method based on hyperplane sorting,stratified sampling and recombination is proposed for multiple categories of imbalanced data to obtain classified balanced data sets for machine learning.First,the maximum common sampling number for different kinds of samples is determined to calculate the number of samples for each class of samples.Then,according to the distance from the data to the classification hyperplane,the data of each type of sample is reordered.The samples of each category are stratified based on equal intervals to ensure that the total number of samples of each category is the maximum common sampling number,which is used as the base sample within the sample.Finally,according to the idea of permutation and combination,a balanced data set is constructed.After training and testing the data classification algorithm,the results show that this sampling method not only achieves data balance among multiple kinds of samples,but also maintains the original data distribution characteristics of the samples,and improves the accuracy of downstream algorithms in machine learning.
作者 邓明阳 郭应时 刘通 DENG Mingyang;GUO Yingshi;LIU Tong(School of Automobile,Chang’an University,Xi’an 710064,China;Department of Automobile Engineering,College of Humanities&Information,Changchun University of Technology,Changchun 130122,China;College of Traffic&Transportation,Chongqing Jiaotong University,Chongqing 400074,China)
出处 《重庆理工大学学报(自然科学)》 CAS 北大核心 2021年第8期122-128,共7页 Journal of Chongqing University of Technology:Natural Science
基金 国家重点研发计划项目(2019YFB1600500)。
关键词 数据处理 不平衡数据 分层抽样 排列组合 复合评价 data processing imbalanced data stratified sampling permutation and combination composite evaluation
  • 相关文献

参考文献10

二级参考文献50

共引文献108

同被引文献21

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部