An Imbalanced Dataset and Class Overlapping Classification Model for Big Data 被引量：1

下载PDF

导出

摘要 Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classiﬁers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Theﬁrst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classiﬁcation model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time.

作者 Mini Prince P.M.Joe Prathap

机构地区 Department of Information Technology Department of Information Technology

出处《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1009-1024,共16页 计算机系统科学与工程（英文）

关键词 Imbalanced dataset class overlapping SMOTE MAPREDUCE parallel programming OVERSAMPLING

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

同被引文献4

1杨淑群,芮景析,冯汉中.支持向量机(SVM)方法在降水分类预测中的应用[J].西南农业大学学报（自然科学版）,2006,28(2):252-257. 被引量：20
2冯亮,肖辉,孙跃.X波段双偏振雷达水凝物粒子相态识别应用研究[J].气候与环境研究,2018,23(3):366-386. 被引量：34
3李海,尚金雷,孙婷逸,冯青,庄子波.一种基于离散属性BNT的双偏振气象雷达降水粒子分类方法[J].电子学报,2021,49(3):619-624. 被引量：3
4李海,白锦,孙研,任嘉伟.基于修正小波变换插值-TAN的雷达降水粒子分类[J].系统工程与电子技术,2022,44(5):1527-1535. 被引量：1

引证文献1

1李海,田众,钱君.基于ECOC平衡随机森林的雷达降水粒子分类[J].系统工程与电子技术,2024,46(5):1599-1606.

1Haizhou Wang,Anoop Singhal,Peng Liu.Tackling imbalanced data in cybersecurity with transfer learning: a case with ROP payload detection[J].Cybersecurity,2023,6(2):29-43.
2Huang Tongtong.The Splendour of Spring[J].China Book International,2023(2):88-89.
3M.Mohamed Iqbal,K.Latha.A Parallel Approach for Sentiment Analysis on Social Networks Using Spark[J].Intelligent Automation & Soft Computing,2023(2):1831-1842. 被引量：1
4Samra Rehman,Muhammad Attique Khan,Majed Alhaisoni,Ammar Armghan,Fayadh Alenezi,Abdullah Alqahtani,Khean Vesal,Yunyoung Nam.Fruit Leaf Diseases Classification: A Hierarchical Deep Learning Framework[J].Computers, Materials & Continua,2023(4):1179-1194.
5Jiawei NIU,Zhunga LIU,Quan PAN,Yanbo YANG,Yang LI.Conditional self-attention generative adversarial network with differential evolution algorithm for imbalanced data classification[J].Chinese Journal of Aeronautics,2023,36(3):303-315.
6Hemant Kumar Singh,Bhanu Pratap,S.K.Maheshwari,Ayushi Gupta,Anuradha Chug,Amit Prakash Singh,Dinesh Singh.Spray Prediction Model for Aonla Rust Disease Using Machine Learning[J].Journal of Agricultural Science and Technology(B),2023,13(1):1-12.
7Yap Bee Wah,Azlan Ismail,Nur Niswah Naslina Azid,Jafreezal Jaafar,Izzatdin Abdul Aziz,Mohd Hilmi Hasan,Jasni Mohamad Zain.Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction[J].Computers, Materials & Continua,2023(6):4821-4841.
8孙园,王珅,黄冬梅,胡伟,胡安铎,孙锦中,房岭峰.基于熵权法集成异质分类器的窃电检测[J].科学技术与工程,2023,23(15):6455-6464. 被引量：2
9Nan Jiang,Huizhen Zhang.Improved Adaptive Differential Evolution Algorithm for the Un-Capacitated Facility Location Problem[J].Open Journal of Applied Sciences,2023,13(5):685-695.
10Guoxu Fang,Jianhui Fan,Zongren Ding,Yongyi Zeng.Application of biological big data and radiomics in hepatocellular carcinoma[J].iLIVER,2023,2(1):41-49.

Computer Systems Science & Engineering

2023年第2期

浏览历史

内容加载中请稍等...

An Imbalanced Dataset and Class Overlapping Classification Model for Big Data 被引量：1

同被引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史