Boosting imbalanced data learning with Wiener process oversampling 被引量：1

Boosting imbalanced data learning with Wiener process oversampling

导出

摘要 Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, oversampling balances the training samples through replicating existing samples or synthesizing new samples. In general, synthesization outperforms replication by supplying additional information on the minority class. However, the additional information needs to follow the same normal distribution of the training set, which further constrains the new samples within the predefined range of training set. In this paper, we present the Wiener process oversampling （WPO） technique that brings the physics phenomena into sample synthesization. WPO constructs a robust decision region by expanding the attribute ranges in training set while keeping the same normal distribution. The satisfactory performance of WPO can be achieved with much lower computing complexity. In addition, by integrating WPO with ensemble learning, the WPOBoost algorithm outperforms many prevalent imbalance learning solutions. Learning from imbalanced data is a challenging task in a wide range of applications, which attracts significant research efforts from machine learning and data mining community. As a natural approach to this issue, oversampling balances the training samples through replicating existing samples or synthesizing new samples. In general, synthesization outperforms replication by supplying additional information on the minority class. However, the additional information needs to follow the same normal distribution of the training set, which further constrains the new samples within the predefined range of training set. In this paper, we present the Wiener process oversampling （WPO） technique that brings the physics phenomena into sample synthesization. WPO constructs a robust decision region by expanding the attribute ranges in training set while keeping the same normal distribution. The satisfactory performance of WPO can be achieved with much lower computing complexity. In addition, by integrating WPO with ensemble learning, the WPOBoost algorithm outperforms many prevalent imbalance learning solutions.

作者 Qian LI Gang LI Wenjia NIU Yanan CAO Liang CHANG Jianlong TAN Li GUO

机构地区 Institute of Information Engineering School of Information Technology Guangxi Key Laboratory of Trusted Software

出处《Frontiers of Computer Science》 SCIE EI CSCD 2017年第5期836-851,共16页 中国计算机科学前沿（英文版）

基金 Acknowledgements This research was partially supported by the Strategic Priority Research Program of the Chinese Academy of Sciences （XDA06030200）, the National Natural Science Foundation of China （Grant Nos. M1552006, 61403369, 61272427, and 61363030）, Xinjiang Uygur Autonomous Region Science and Technology Project （201230123）, Beijing Key Lab of Intelligent Telecommunication Software, Multimedia （ITSM201502）, Guangxi Key Laboratory of Trusted Software （kx201418）.

关键词 imbalanced-data learning OVERSAMPLING ensemble learning Wiener process ADABOOST imbalanced-data learning, oversampling, ensemble learning, Wiener process, AdaBoost

分类号 TP18 [自动化与计算机技术—控制理论与控制工程] S512.103.7 [农业科学—作物学]

引文网络
相关文献

参考文献3

1Bo YUAN,Wenhuang LIU.Measure oriented training： a targeted approach to imbalanced classification problems[J].Frontiers of Computer Science,2012,6(5):489-497. 被引量：1
2韩慧,王文渊,毛炳寰.不均衡数据集中基于Adaboost的过抽样算法[J].计算机工程,2007,33(10):207-209. 被引量：13
3Eryun LIU,Heng ZHAO,Fangfei GUO,Jimin LIANG,Jie TIAN.Fingerprint segmentation based on an AdaBoost classifier[J].Frontiers of Computer Science,2011,5(2):148-157. 被引量：4

二级参考文献4

1Weiss G Mining with Rarity:A Unifying Framework[C]//Proc.of SIGKDD Explorations,Chicago,IL,USA.2004.
2Schapire R,Singer Y.Improved Boosting Algorithms Using Confidence-rated Predictions[J].Machine Learning,1999,37(3):297-336.
3Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic Minority Over-sampling Technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
4Blake C,Merz C.UCI Repository of Machine Learning Databases[Z].1998.http://www.ics.uci.edu/-mlearn/MLRepository.html.

共引文献15

1刘海涛,黄敏,朱启兵,王聪.基于支持向量机的不平衡数据分类算法的研究[J].计算机应用研究,2009,26(8):2874-2875. 被引量：8
2潘俊,李宏,李博.基于推进的非平衡数据分类算法研究[J].计算机工程与应用,2009,45(25):138-140.
3郭鹏,葛玮.基于不平衡数据集的级联决策树改进算法[J].计算机工程,2009,35(24):75-77. 被引量：2
4陶新民,徐晶,童智靖,刘玉.不均衡数据下基于阴性免疫的过抽样新算法[J].控制与决策,2010,25(6):867-872. 被引量：11
5尹军梅,杨明,万建武.一种面向不平衡数据集的核Fisher线性判别分析方法[J].模式识别与人工智能,2010,23(3):414-420. 被引量：5
6祝晓燕,常宏飞,张金会.基于遗传算法的不均衡样本在支持向量机中的研究[J].机械工程师,2012(5):11-13.
7丁一琦.一种不均衡数据集的决策树改进算法[J].电子世界,2013(19):67-67.
8汪庆.指纹图像分割方法研究[J].淮南师范学院学报,2014,16(5):87-89.
9武昊,席旭刚,罗志增.基于熵和PSO优化SVM的肌电信号跌倒识别[J].传感技术学报,2015,28(11):1586-1590. 被引量：5
10席旭刚,左静,罗志增.肌电模糊熵特征的加权核FDA跌倒识别[J].电子学报,2016,44(6):1376-1382. 被引量：4

同被引文献3

1Jianhua Jia (12) jjh163yx@163.com Bingxiang Liu (1) Licheng Jiao (2).Soft spectral clustering ensemble applied to image segmentation[J].Frontiers of Computer Science,2011,5(1):66-78. 被引量：6
2Bo SUN,Haiyan CHEN,Jiandong WANG,Hua XIE.Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J].Frontiers of Computer Science,2018,12(2):331-350. 被引量：11
3Tao SUN,Zhi-Hua ZHOU.Structural diversity for decision tree ensemble learning[J].Frontiers of Computer Science,2018,12(3):560-570. 被引量：9

引证文献1

1Xibin DONG,Zhiwen YU,Wenming CAO,Yifan SHI,Qianli MA.A survey on ensemble learning[J].Frontiers of Computer Science,2020,14(2):241-258. 被引量：42

二级引证文献42

1Peng Ni,Su-Yun Zhao,Zhi-Gang Dai,Hong Chen,Cui-Ping Li.Partial Label Learning via Conditional-Label-Aware Disambiguation[J].Journal of Computer Science & Technology,2021,36(3):590-605.
2李小娟,韩萌,王乐,张妮,程浩东.监督与半监督学习下的数据流集成分类综述[J].计算机应用研究,2021,38(7):1921-1929. 被引量：6
3XU Zhe,NI Wei-chen,JI Yue-hui.Rotation forest based on multimodal genetic algorithm[J].Journal of Central South University,2021,28(6):1747-1764. 被引量：2
4Xinlei Wang,Jianing Zhi.A machine learning-based analytical framework for employee turnover prediction[J].Journal of Management Analytics,2021,8(3):351-370. 被引量：1
5王曦锐,芦天亮,张建岭,丁锰.基于加权Stacking集成学习的Tor匿名流量识别方法[J].信息网络安全,2021(12):118-125. 被引量：7
6李小娟,韩萌,王乐,张妮,程浩东.基于准确率爬坡的动态加权集成分类算法[J].计算机应用,2022,42(1):123-131.
7罗家健,冯宝,陈相猛,顾正晖.基于生成对抗网络的肺结节良恶性诊断算法[J].东北大学学报（自然科学版）,2022,43(1):24-32. 被引量：1
8Zhibin Wang,Kaiyi Wang,Xiaofeng Wang,Shouhui Pan,Xiaojun Qiao.Dynamic ensemble selection of convolutional neural networks and its application in flower classification[J].International Journal of Agricultural and Biological Engineering,2022,15(1):216-223.
9龚卫华,陈凯,王百城.基于监督学习的分类器自适应融合方法[J].传感技术学报,2022,35(2):195-201. 被引量：3
10Juntao CHEN,Quan ZOU,Jing LI.DeepM6ASeq-EL:prediction of human N6-methyladenosine(m^(6)A)sites with LSTM and ensemble learning[J].Frontiers of Computer Science,2022,16(2):27-33. 被引量：2

1乔丽,赵尔敦,刘俊杰,程彬.基于CNN的工件缺陷检测方法研究[J].计算机科学,2017,44(B11):238-243. 被引量：18
2陈文健,张海樟.高维带宽有限随机信号从平均过采样的指数阶逼近[J].计算数学,2017,39(4):339-350.
3吴晓辉,蔡忠义,李全祥.非线性和随机效应的加速退化过程建模方法[J].合肥工业大学学报（自然科学版）,2017,40(10):1308-1311. 被引量：6
4陈秀荣,李娟,于加举.融合寿命数据和退化数据的防喷阀剩余寿命预测[J].山东科技大学学报（自然科学版）,2017,36(5):23-28. 被引量：2
5仝小敏,吉祥.基于自训练的回归算法[J].中国电子科学研究院学报,2017,12(5):498-502. 被引量：3
6张正新,胡昌华,司小胜,张伟.双时间尺度下的设备随机退化建模与剩余寿命预测方法[J].自动化学报,2017,43(10):1789-1798. 被引量：5
7刘莹.滇中高校图书馆与公共图书馆信息的资源共享[J].玉溪师范学院学报,2017,33(3):61-65. 被引量：2
8李梦玉,申广君,崔静.一类多维参数高斯过程的弱逼近[J].数学杂志,2017,37(6):1287-1302. 被引量：5
9严木兰.输入流技术在初中英语词汇教学中的应用[J].校园英语,2017,0(34):94-95.
10王立中,管声启.基于深度学习算法的带钢表面缺陷识别[J].西安工程大学学报,2017,31(5):669-674. 被引量：18

Frontiers of Computer Science

2017年第5期

浏览历史

内容加载中请稍等...