期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
An Imbalanced Dataset and Class Overlapping Classification Model for Big Data 被引量:1
1
作者 Mini Prince P.M.Joe Prathap 《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1009-1024,共16页
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba... Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time. 展开更多
关键词 Imbalanced dataset class overlapping SMOTE MAPREDUCE parallel programming OVERSAMPLING
下载PDF
A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
2
作者 Ping Gong Junguang Gao Li Wang 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2022年第6期728-752,共25页
Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary cl... Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary classification tasks.However,few efforts have been made to deal with the class overlap problem that accompanies imbalances simultaneously.To this end,this study proposes a Tomek link and genetic algorithm(GA)-based under-sampling framework(TEUS)to address the class imbalance and overlap issues in binary credit classification by eliminating majority class instances with considering multi-perspective factors.TEUS first determines boundary majority instances with Tomek link,then take the distance from each majority instance to its nearest boundary as the radius and assigns the density of opposite class samples within the radius as the overlap potential of that majority instance.Second,TEUS weighs each non-borderline majority instance based on its information contribution in estimating class labels.After partitioning non-borderline majority instances into subgroups according to overlap potential and information contribution,TEUS applies GA to select samples from subgroups and merge them with the minority samples into a new training set.Innovatively,the design of the fitness function in GA and the grouping of the non-borderline majority not only trade off the multi-perspective characteristics of instances but also help reduce the computational complexity of the sampling optimization search.Numerical experiments on real-world credit data sets demonstrate the effectiveness of the proposed TEUS. 展开更多
关键词 Imbalance classification credit classification class overlap evolutionary under-sampling genetic algorithm
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部