期刊文献+

一种基于改进CGAN的不平衡数据集成分类算法

Ensemble Classification Algorithm Based on Improved CGAN for Imbalanced Data
下载PDF
导出
摘要 CGAN能学习到数据的分布特性并生成符合原始数据分布的新样本,将其作为过采样方法可以提升不平衡数据的分类性能.然而,当少数类样本规模较小时CGAN不能充分学习其分布特征,导致生成的样本质量欠佳.为此,本文提出一种基于改进CGAN的不平衡数据集成分类算法.首先采用SMOTEENN方法快速生成少数类样本并使其达到一定规模,训练出能充分学习少数类样本分布特性的CGAN模型,然后重新生成符合原始数据分布的少数类样本以构建平衡数据集.最后以CART决策树为基分类器,通过对Adaboost方法进行改进并用其训练所构建的平衡数据集,得到最终分类模型.选择F1值、AUC和G-mean作为分类评价指标,在8组公开数据集上的实验结果表明,所提方法可以显著提高不平衡数据的分类精度. CGAN(Conditional Generative Adversarial Network)can learn the distribution characteristics from the data and generate new samples that conform to the original data distribution.Using it as an oversampling method can improve the classification performance of imbalanced data.However,when the minority sample size is small It is difficult to ensure that CGAN fully learns its distribution characteristics,which in turn leads to poor quality of synthesized samples.For this reason,an imbalanced data ensemble classification algorithm based on improved CGAN is proposed.Firstly,SMOTEENN(edited nearest neighbor oversampling)is used to quickly generate minority class samples and make them reach a certain scale and train a CGAN model that can fully learn the characteristics of data distribution,then regenerate minority class samples that conform to the original data distribution to build a balanced dataset.Finally,using the CART decision tree as the base classifier,improve the Adaboost method and train the balanced dataset to obtain the final classification model.The F1 value,AUC and G-mean are selected as evaluation indicators.The experimental results on 8 public data sets show that the proposed method can significantly improve the classification accuracy of imbalanced data.
作者 刘宁 朱波 荆晓娜 阴艳超 LIU Ning;ZHU Bo;JING Xiao-na;YIN Yan-chao(Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650504,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2023年第9期1918-1924,共7页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(52065033)资助.
关键词 不平衡数据 条件生成对抗网络 过采样 ADABOOST 集成学习 imbalanced data conditional generative adversarial network over-sampling Adaboost ensemble learning
  • 相关文献

参考文献4

二级参考文献29

  • 1俞研,黄皓.基于改进多目标遗传算法的入侵检测集成方法(英文)[J].软件学报,2007,18(6):1369-1378. 被引量:21
  • 2Chen J Y, Yang D Y, Matsumoto N. A study of detector gener ation algorithms based on artificial immune in intrusion detection system[J]. WSEAS Trans. on Biology and Biomedicine,2007, 3(4) : 29 - 35.
  • 3Iren L F, Francisco M P, Francisco J G, et al. Intrusion detec-tion method using neural networks based on the reduction of characteristics[C]//Proc, of the lOth International Work-Con- ference on Aarti fical Neural Networks, 2009 : 1296 - 1303.
  • 4Xie L X, Zhu D, Yang H Y. Research on SVM based network intrusion detection elassifieation[C]// Proc. of the 6th Interna tional Conference on Fuzzy Systems and Knowledge Discovery, 2009: 362 - 366.
  • 5Yi Y, Wu J S, Xu W. Incremental SVM based on reserved set for network intrusion detection[J]. Expert Systems with Appli cations, 2011,38(6) : 7698 -7707.
  • 6Latifur K, Mamoun A, Bhavani T. A new intrusion detection system using support vector machines and hierarchical clustering [J]. The International Journal on Very Large Data Bases, 2007,16(4): 507-521.
  • 7Zhao Z Y, Zhong P, Zhao Y H. Learning SVM with weighted maximum margin criterion for classification of imbalanced data [J]. Mathematical and Computer Modelling, 2011, 54(3 - 4) : 1093 - 1099.
  • 8Han H, Wang W, Mao B. Borderline smote: a new over-sam- pling method in imbalaneed data sets learning[C]//Proc, of the International Conference on Intelligent Computing, 2005 878 - 887.
  • 9Liu Y, Yu X H, Huang X J. Combining integrated sampling with SVM ensembles for learning from imbalanced datasets[J]. Information Processing and Management, 2011, 47(4): 617 - 631.
  • 10Sun Y, Kamela M, Wongb A. Cost sensitive boosting for clas sification of imbalaneed data[J]. Pattern Recognition, 2007, 40 (12): 3358 - 3378.

共引文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部