期刊文献+

基于遗传算法的一种不平衡数据集采样方法GSA 被引量:3

GSA: an imbalanced data set sampling method based on genetic algorithm
下载PDF
导出
摘要 分类模型的训练是机器学习中的基本问题。分类模型的优劣关键在于训练集样本的质量。传统的分类模型默认各样本类别中样本数量基本均衡,而忽略了样本不平衡的影响,但不平衡样本对模型的预测能力影响非常大。为了保证数据的平衡性,提出一种基于遗传算法(Genetic Algorithm)与SMOTE(Synthetic Minority Oversampling Technique)算法融合的样本合成方法—GSA算法(Genetic-SMOTE Algorithm)。该算法针对数量少的样本类别,通过对样本特征进行编码,结合遗传算法思想合成新样本,以提高样本的均衡性。实验对比证明,本算法保证了新合成样本与原样本的相似性,丰富了样本集的多样性,从而提高了模型的分类精度。 The training of classification model is a basic problem in machine learning. The key to the classification model is the quality of training sample sets. The default setting of traditional classification model is that the number of samples in each sample category is basically equalized,ignoring the impact of sample imbalance. But imbalanced samples have a great impact on the prediction ability of the model. In order to ensure the balance of the data,we have proposed a sample synthesis method based on genetic algorithm and synthetic minority oversampling technique( SMOTE) algorithm-GSA algorithm( Genetic-SMOTE Algorithm). The algorithm targets sample categories with a small number of samples,encodes the sample features,and synthesizes new samples basing on genetic algorithm,so as to improve the balance of samples. Experiments have proved that this algorithm guarantees the similarity between the newly synthesized sample and the original sample,and enriches the diversity of the sample sets,thus improving the classification accuracy of the model.
作者 张巡 黎平 刘萍 ZHANG Xun;LI Ping;LIU Ping(School of Big Data and Computer Science, Guizhou Normal Univeristy, Guiyang 550025, Chin)
出处 《贵州科学》 2018年第2期93-96,共4页 Guizhou Science
关键词 分类模型 样本不平衡 遗传算法 SMOTE算法 GSA算法 classification model, sample imbalance, genetic algorithm, SMOTE algorithm, GSA algorithm
  • 相关文献

参考文献4

二级参考文献88

共引文献513

同被引文献34

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部