期刊文献+

面向不平衡数据集的一种基于SMOTE的集成学习算法 被引量:5

An Ensemble Learning Method Based on SMOTE for Imbalanced Data Set
下载PDF
导出
摘要 合成少数类过采样技术(Synthetic Minority Over-sampling Technique,SMOTE)是一种用于处理不平衡数据的过采样方法,B-SMOTE在SMOTE的基础上把少数类的边界样本与少数类样本插值合成新训练样本。RB-SMOTE在B-SMOTE的基础上对少数类的边界样本进行区分,精细化分配新增样本的数量,从而提高了不平衡数据的分类精度。提出了面向不平衡数据集的一种基于SMOTE的集成学习算法,该方法通过RB-SMOTE合成不平衡率不一的多个新训练样本,组成相应的多个基分类器,再采用投票的方式对测试样本进行分类。仿真实验表明基于SMOTE的集成学习算法能有效改善不平衡数据集的不平衡性。 One of the important over-sampling approaches to solve the classification problem of imbalanced data is SMOTE(Synthetic Minority Over-sampling Technique). Based on SMOTE method, B-SMOTE generates new synthetic samples over-sampling the minority samples near the borderline. Based on B-SMOTE method, RB-SMOTE generates different number of new synthetic samples according to different borderline minority samples. In this paper, we proposed an ensemble learning method based on SMOTE for imbalanced data set which generates several new samples with different imbalanced rates to form corresponding base classifiers. Then the test samples are classified by voting. The experimental results show that the algorithms could improve the classification performance of imbalance data sets effectively.
作者 杨毅 梅颖 YANG Yi;MEI Ying(Faculty of Engineering,Lishui University,Lishui 323000,Zhejiang)
机构地区 丽水学院工学院
出处 《丽水学院学报》 2020年第5期64-69,共6页 Journal of Lishui University
基金 浙江省自然科学基金资助项目“面向不平衡数据集的分类方法研究”(LY18F030003)。
关键词 不平衡数据 过采样技术 分类 集成学习 imbalanced data over-sampling classification ensemble learning
  • 相关文献

参考文献4

二级参考文献37

  • 1Golub T R,Slonim D K,Tamayo P,et al.Molecular classification of cancer:Class discovery and class prediction by gene expression[J].Science,1999,286(5439):531.
  • 2Van't Veer L J,Dai H,Van de Vijver M J,et al.Gene expression profiling predicts clinical outcome of breast cancer[J].Nature,2002,415(6871):530.
  • 3Pomeroy S L,Tamayo P,Gaasenbeek M,et al.Prediction of central nervous system embryonal tumour outcome based on gene expression[J].Nature,2002,415(6870):436-442.
  • 4Alizadeh A A,Eisen M B,Davis R E,et al.Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling[J].Nature,2000,403:503-511.
  • 5Valafar F.Pattern recognition techniques in microarray data analysis a survey[J].Annals of the New York Academy of Sciences,2002,980(1):41-64.
  • 6Brown Grundy W N,Lin D,Cristianini N,et al.Knowledge-based analysis of microarray gene expression data by using support vector machines[J].Proceedings of the National Academy of Sciences,2000,97(1).
  • 7Shipp M A,Ross K N,Tamayo P,et al.Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning[J].Nature Medicine,2002,8(1):68-74.
  • 8Bhattacharjee A,Richards W G,Staunton J,et al.Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses[C]// Proceedings of the National Academy of Sciences,2005, 21(15):3301-3307.
  • 9Staunton J E,Slonim D K,Coller H A,et al.Chemosensitivity prediction by transcriptional profiling[C]// Proceedings of the National Academy of Sciences,2001,98(19):10787.
  • 10Su A I,Welsh J B,Sapinoso L M,et al.Molecular classification of human carcinomas by use of gene expression signatures[J].Cancer Research,2001,61(20):7388.

共引文献90

同被引文献46

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部