一种基于改进CGAN的不平衡数据集成分类算法被引量：1

Ensemble Classification Algorithm Based on Improved CGAN for Imbalanced Data

下载PDF

导出

摘要 CGAN能学习到数据的分布特性并生成符合原始数据分布的新样本,将其作为过采样方法可以提升不平衡数据的分类性能.然而,当少数类样本规模较小时CGAN不能充分学习其分布特征,导致生成的样本质量欠佳.为此,本文提出一种基于改进CGAN的不平衡数据集成分类算法.首先采用SMOTEENN方法快速生成少数类样本并使其达到一定规模,训练出能充分学习少数类样本分布特性的CGAN模型,然后重新生成符合原始数据分布的少数类样本以构建平衡数据集.最后以CART决策树为基分类器,通过对Adaboost方法进行改进并用其训练所构建的平衡数据集,得到最终分类模型.选择F1值、AUC和G-mean作为分类评价指标,在8组公开数据集上的实验结果表明,所提方法可以显著提高不平衡数据的分类精度. CGAN(Conditional Generative Adversarial Network)can learn the distribution characteristics from the data and generate new samples that conform to the original data distribution.Using it as an oversampling method can improve the classification performance of imbalanced data.However,when the minority sample size is small It is difficult to ensure that CGAN fully learns its distribution characteristics,which in turn leads to poor quality of synthesized samples.For this reason,an imbalanced data ensemble classification algorithm based on improved CGAN is proposed.Firstly,SMOTEENN(edited nearest neighbor oversampling)is used to quickly generate minority class samples and make them reach a certain scale and train a CGAN model that can fully learn the characteristics of data distribution,then regenerate minority class samples that conform to the original data distribution to build a balanced dataset.Finally,using the CART decision tree as the base classifier,improve the Adaboost method and train the balanced dataset to obtain the final classification model.The F1 value,AUC and G-mean are selected as evaluation indicators.The experimental results on 8 public data sets show that the proposed method can significantly improve the classification accuracy of imbalanced data.

作者刘宁朱波荆晓娜阴艳超 LIU Ning;ZHU Bo;JING Xiao-na;YIN Yan-chao(Faculty of Mechanical and Electrical Engineering,Kunming University of Science and Technology,Kunming 650504,China)

机构地区昆明理工大学机电工程学院

出处《小型微型计算机系统》 CSCD 北大核心 2023年第9期1918-1924,共7页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(52065033)资助.

关键词不平衡数据条件生成对抗网络过采样 ADABOOST 集成学习 imbalanced data conditional generative adversarial network over-sampling Adaboost ensemble learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1李忠智,尹航,左剑凯,刘鹤丹.不平衡训练数据下的基于生成对抗网络的轴承故障诊断[J].小型微型计算机系统,2021,42(1):46-51. 被引量：8
2井小沛,汪厚祥,聂凯.基于修正核函数SVM的网络入侵检测[J].系统工程与电子技术,2012,34(5):1036-1040. 被引量：12
3黄锦涛,钱文彬,王映龙.基于标记增强的多标记代价敏感特征选择算法[J].小型微型计算机系统,2020,41(4):685-691. 被引量：4
4赵海霞,石洪波,武建,陈鑫.基于条件生成对抗网络的不平衡学习研究[J].控制与决策,2021,36(3):619-628. 被引量：7

二级参考文献29

1俞研,黄皓.基于改进多目标遗传算法的入侵检测集成方法(英文)[J].软件学报,2007,18(6):1369-1378. 被引量：21
2Chen J Y, Yang D Y, Matsumoto N. A study of detector gener ation algorithms based on artificial immune in intrusion detection system[J]. WSEAS Trans. on Biology and Biomedicine,2007, 3(4) : 29 - 35.
3Iren L F, Francisco M P, Francisco J G, et al. Intrusion detec-tion method using neural networks based on the reduction of characteristics[C]//Proc, of the lOth International Work-Con- ference on Aarti fical Neural Networks, 2009 : 1296 - 1303.
4Xie L X, Zhu D, Yang H Y. Research on SVM based network intrusion detection elassifieation[C]// Proc. of the 6th Interna tional Conference on Fuzzy Systems and Knowledge Discovery, 2009: 362 - 366.
5Yi Y, Wu J S, Xu W. Incremental SVM based on reserved set for network intrusion detection[J]. Expert Systems with Appli cations, 2011,38(6) : 7698 -7707.
6Latifur K, Mamoun A, Bhavani T. A new intrusion detection system using support vector machines and hierarchical clustering [J]. The International Journal on Very Large Data Bases, 2007,16(4): 507-521.
7Zhao Z Y, Zhong P, Zhao Y H. Learning SVM with weighted maximum margin criterion for classification of imbalanced data [J]. Mathematical and Computer Modelling, 2011, 54(3 - 4) : 1093 - 1099.
8Han H, Wang W, Mao B. Borderline smote: a new over-sam- pling method in imbalaneed data sets learning[C]//Proc, of the International Conference on Intelligent Computing, 2005 878 - 887.
9Liu Y, Yu X H, Huang X J. Combining integrated sampling with SVM ensembles for learning from imbalanced datasets[J]. Information Processing and Management, 2011, 47(4): 617 - 631.
10Sun Y, Kamela M, Wongb A. Cost sensitive boosting for clas sification of imbalaneed data[J]. Pattern Recognition, 2007, 40 (12): 3358 - 3378.

共引文献27

1崔文泉,余厚莹,侯晓天.不均衡数据情形的基于聚焦损失的CGAN的集成分类方法[J].中国科学技术大学学报,2020,50(7):968-976.
2李川,伍依凡,杨帅.不平衡分布的数据驱动故障诊断的研究进展[J].仪器仪表学报,2023,44(8):181-197. 被引量：2
3李成云,支冬栋.基于动态SVM的网络入侵检测研究[J].计算机与数字工程,2012,40(11):118-120. 被引量：1
4胡小生,张润晶,钟勇.两层聚类的类别不平衡数据挖掘算法[J].计算机科学,2013,40(11):271-275. 被引量：6
5姬五胜,张玉,李益敏,刘阁,郭宏.微波电路三维集成辅助设计软件[J].计算机应用与软件,2013,30(12):308-310.
6耿姣.基于邻界区的快速增量SVM入侵检测算法的研究[J].计算机应用与软件,2013,30(12):322-324.
7方向,王丽娜,贾颖.智能化入侵检测算法研究综述[J].通信技术,2015,48(12):1321-1328. 被引量：4
8王超学,张涛,马春森.改进SVM-KNN的不平衡数据分类[J].计算机工程与应用,2016,52(4):51-55. 被引量：21
9高妮,高岭,贺毅岳.面向入侵检测系统的Deep Belief Nets模型[J].系统工程与电子技术,2016,38(9):2201-2207. 被引量：23
10黄可望,蔡一新,朱嘉钢.基于PCA-2KPCA-SVM的pod入侵高精度检测方法[J].计算机工程与设计,2017,38(8):2092-2098. 被引量：3

同被引文献7

1向鸿鑫,杨云.不平衡数据挖掘方法综述[J].计算机工程与应用,2019,55(4):1-16. 被引量：54
2王俊红,闫家荣.基于欠采样和代价敏感的不平衡数据分类算法[J].计算机应用,2021,41(1):48-52. 被引量：24
3王磊,刘雨,刘志中,齐俊艳.处理不平衡数据的聚类欠采样加权随机森林算法[J].计算机应用研究,2021,38(5):1398-1402. 被引量：12
4张敏,彭红伟,颜晓玲.基于神经网络的模糊决策树改进算法[J].计算机工程与应用,2021,57(21):174-179. 被引量：9
5李京泰,王晓丹.基于代价敏感激活函数XGBoost的不平衡数据分类方法[J].计算机科学,2022,49(5):135-143. 被引量：7
6刘赛可,何晓群,夏利宇.不平衡数据下模型评价指标的有效性探讨[J].统计与决策,2022(19):5-9. 被引量：7
7李艳霞,柴毅,胡友强,尹宏鹏.不平衡数据分类方法综述[J].控制与决策,2019,34(4):673-688. 被引量：162

引证文献1

1陈婷,谢志龙.基于改进决策树的不平衡数据集分类算法研究[J].计算机仿真,2024,41(8):497-501.

1任瑞琪.基于AdaBoost-KELM方法的短期电力负荷预测研究[J].西安轨道交通职业教育研究,2023(1):31-34.
2李先鹏,吴若男,王义洋,王会宇,刘妙男,王魏.融合滑动窗口和MLP-AdaBoost的电力负荷预测[J].计算机与数字工程,2023,51(1):66-73. 被引量：3
3王小飞,陈永展,王强,高艳丽,李健增.面向大规模数据的SVDD在线学习算法[J].测控技术,2023,42(8):1-6. 被引量：1
4谢章伟,张兴波,徐哲,张羽,张丰云,王茜,王萍萍,孙树峰,王海涛,刘纪新,孙维丽,曹爱霞.基于数字孪生的激光加工零件表面温度监控系统的构建[J].工程设计学报,2023,30(4):409-418. 被引量：2
5王正文,王俊峰.基于生成式零样本学习的未知恶意流量分类方法[J].四川大学学报（自然科学版）,2023,60(4):67-74. 被引量：1
6贾燕华,李英梅.基于自适应聚类过采样的软件缺陷预测研究[J].哈尔滨师范大学自然科学学报,2023,39(2):45-50. 被引量：1
7葛梦飞,李赵旭,刘嘉欣,王宏伟,王佳.基于BP_Adaboost模型的乳腺癌诊断预测方法研究[J].太赫兹科学与电子信息学报,2023,21(8):1014-1021. 被引量：1

小型微型计算机系统

2023年第9期

浏览历史

内容加载中请稍等...

一种基于改进CGAN的不平衡数据集成分类算法被引量：1

参考文献4

二级参考文献29

共引文献27

同被引文献7

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于改进CGAN的不平衡数据集成分类算法 被引量：1

参考文献4

二级参考文献29

共引文献27

同被引文献7

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于改进CGAN的不平衡数据集成分类算法被引量：1