期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
A novel overlapping minimization SMOTE algorithm for imbalanced classification
1
作者 Yulin HE Xuan LU +1 位作者 Philippe FOURNIER-VIGER Joshua Zhexue HUANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2024年第9期1266-1281,共16页
The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its ... The synthetic minority oversampling technique(SMOTE) is a popular algorithm to reduce the impact of class imbalance in building classifiers, and has received several enhancements over the past 20 years. SMOTE and its variants synthesize a number of minority-class sample points in the original sample space to alleviate the adverse effects of class imbalance. This approach works well in many cases, but problems arise when synthetic sample points are generated in overlapping areas between different classes, which further complicates classifier training. To address this issue, this paper proposes a novel generalization-oriented rather than imputation-oriented minorityclass sample point generation algorithm, named overlapping minimization SMOTE(OM-SMOTE). This algorithm is designed specifically for binary imbalanced classification problems. OM-SMOTE first maps the original sample points into a new sample space by balancing sample encoding and classifier generalization. Then, OM-SMOTE employs a set of sophisticated minority-class sample point imputation rules to generate synthetic sample points that are as far as possible from overlapping areas between classes. Extensive experiments have been conducted on 32 imbalanced datasets to validate the effectiveness of OM-SMOTE. Results show that using OM-SMOTE to generate synthetic minority-class sample points leads to better classifier training performances for the naive Bayes,support vector machine, decision tree, and logistic regression classifiers than the 11 state-of-the-art SMOTE-based imputation algorithms. This demonstrates that OM-SMOTE is a viable approach for supporting the training of high-quality classifiers for imbalanced classification. The implementation of OM-SMOTE is shared publicly on the Git Hub platform at https://github.com/luxuan123123/OM-SMOTE/. 展开更多
关键词 imbalanced classification Synthetic minority oversampling technique(SMOTE) Majority-class sample point Minority-class sample point Generalization capability Overlapping minimization
原文传递
An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine 被引量:1
2
作者 Bo Zhu Xiaona Jing +1 位作者 Lan Qiu Runbo Li 《Computers, Materials & Continua》 SCIE EI 2024年第6期3977-3999,共23页
When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to ... When building a classification model,the scenario where the samples of one class are significantly more than those of the other class is called data imbalance.Data imbalance causes the trained classification model to be in favor of the majority class(usually defined as the negative class),which may do harm to the accuracy of the minority class(usually defined as the positive class),and then lead to poor overall performance of the model.A method called MSHR-FCSSVM for solving imbalanced data classification is proposed in this article,which is based on a new hybrid resampling approach(MSHR)and a new fine cost-sensitive support vector machine(CS-SVM)classifier(FCSSVM).The MSHR measures the separability of each negative sample through its Silhouette value calculated by Mahalanobis distance between samples,based on which,the so-called pseudo-negative samples are screened out to generate new positive samples(over-sampling step)through linear interpolation and are deleted finally(under-sampling step).This approach replaces pseudo-negative samples with generated new positive samples one by one to clear up the inter-class overlap on the borderline,without changing the overall scale of the dataset.The FCSSVM is an improved version of the traditional CS-SVM.It considers influences of both the imbalance of sample number and the class distribution on classification simultaneously,and through finely tuning the class cost weights by using the efficient optimization algorithm based on the physical phenomenon of rime-ice(RIME)algorithm with cross-validation accuracy as the fitness function to accurately adjust the classification borderline.To verify the effectiveness of the proposed method,a series of experiments are carried out based on 20 imbalanced datasets including both mildly and extremely imbalanced datasets.The experimental results show that the MSHR-FCSSVM method performs better than the methods for comparison in most cases,and both the MSHR and the FCSSVM played significant roles. 展开更多
关键词 imbalanced data classification Silhouette value Mahalanobis distance RIME algorithm CS-SVM
下载PDF
Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification Learning
3
作者 Lilong Duan Wei Xue +1 位作者 Jun Huang Xiao Zheng 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2024年第1期216-231,共16页
The problem of imbalanced data classification learning has received much attention.Conventional classification algorithms are susceptible to data skew to favor majority samples and ignore minority samples.Majority wei... The problem of imbalanced data classification learning has received much attention.Conventional classification algorithms are susceptible to data skew to favor majority samples and ignore minority samples.Majority weighted minority oversampling technique(MWMOTE)is an effective approach to solve this problem,however,it may suffer from the shortcomings of inadequate noise filtering and synthesizing the same samples as the original minority data.To this end,we propose an improved MWMOTE method named joint sample position based noise filtering and mean shift clustering(SPMSC)to solve these problems.Firstly,in order to effectively eliminate the effect of noisy samples,SPMsC uses a new noise filtering mechanism to determine whether a minority sample is noisy or not based on its position and distribution relative to the majority sample.Note that MWMOTE may generate duplicate samples,we then employ the mean shift algorithm to cluster minority samples to reduce synthetic replicate samples.Finally,data cleaning is performed on the processed data to further eliminate class overlap.Experiments on extensive benchmark datasets demonstrate the effectiveness of SPMsC compared with other sampling methods. 展开更多
关键词 imbalanced data classification OVERSAMPLING noise filtering CLUSTERING
原文传递
Cost-Sensitive Dual-Stream Residual Networks for Imbalanced Classification
4
作者 Congcong Ma Jiaqi Mi +1 位作者 Wanlin Gao Sha Tao 《Computers, Materials & Continua》 SCIE EI 2024年第9期4243-4261,共19页
Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as indust... Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as industrial fault diagnosis,network intrusion detection,cancer detection,etc.In imbalanced classification tasks,the focus is typically on achieving high recognition accuracy for the minority class.However,due to the challenges presented by imbalanced multi-class datasets,such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries,existing methods often do not perform well in multi-class imbalanced data classification tasks,particularly in terms of recognizing minority classes with high accuracy.Therefore,this paper proposes a multi-class imbalanced data classification method called CSDSResNet,which is based on a cost-sensitive dualstream residual network.Firstly,to address the issue of limited samples in the minority class within imbalanced datasets,a dual-stream residual network backbone structure is designed to enhance the model’s feature extraction capability.Next,considering the complexities arising fromimbalanced inter-class sample quantities and imbalanced inter-class overlapping boundaries in multi-class imbalanced datasets,a unique cost-sensitive loss function is devised.This loss function places more emphasis on the minority class and the challenging classes with high interclass similarity,thereby improving the model’s classification ability.Finally,the effectiveness and generalization of the proposed method,CSDSResNet,are evaluated on two datasets:‘DryBeans’and‘Electric Motor Defects’.The experimental results demonstrate that CSDSResNet achieves the best performance on imbalanced datasets,with macro_F1-score values improving by 2.9%and 1.9%on the two datasets compared to current state-of-the-art classification methods,respectively.Furthermore,it achieves the highest precision in single-class recognition tasks for the minority class. 展开更多
关键词 Deep learning imbalanced data classification fault diagnosis cost-sensitivity
下载PDF
A Credit Card Fraud Detection Model Based on Multi-Feature Fusion and Generative Adversarial Network 被引量:1
5
作者 Yalong Xie Aiping Li +2 位作者 Biyin Hu Liqun Gao Hongkui Tu 《Computers, Materials & Continua》 SCIE EI 2023年第9期2707-2726,共20页
Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to cr... Credit Card Fraud Detection(CCFD)is an essential technology for banking institutions to control fraud risks and safeguard their reputation.Class imbalance and insufficient representation of feature data relating to credit card transactions are two prevalent issues in the current study field of CCFD,which significantly impact classification models’performance.To address these issues,this research proposes a novel CCFD model based on Multifeature Fusion and Generative Adversarial Networks(MFGAN).The MFGAN model consists of two modules:a multi-feature fusion module for integrating static and dynamic behavior data of cardholders into a unified highdimensional feature space,and a balance module based on the generative adversarial network to decrease the class imbalance ratio.The effectiveness of theMFGAN model is validated on two actual credit card datasets.The impacts of different class balance ratios on the performance of the four resamplingmodels are analyzed,and the contribution of the two different modules to the performance of the MFGAN model is investigated via ablation experiments.Experimental results demonstrate that the proposed model does better than state-of-the-art models in terms of recall,F1,and Area Under the Curve(AUC)metrics,which means that the MFGAN model can help banks find more fraudulent transactions and reduce fraud losses. 展开更多
关键词 Credit card fraud detection imbalanced classification feature fusion generative adversarial networks anti-fraud systems
下载PDF
An Ensemble Tree Classifier for Highly Imbalanced Data Classification
6
作者 SHI Peibei WANG Zhong 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2021年第6期2250-2266,共17页
The performance of traditional imbalanced classification algorithms is degraded when dealing with highly imbalanced data.How to deal with highly imbalanced data is a difficult problem.In this paper,the authors propose... The performance of traditional imbalanced classification algorithms is degraded when dealing with highly imbalanced data.How to deal with highly imbalanced data is a difficult problem.In this paper,the authors propose an ensemble tree classifier for highly imbalanced data classification.The ensemble tree classifier is constructed with a complete binary tree structure.A mathematical model is established based on the features and classification performance of the classifier,and it is proven that the model parameters of the ensemble classifier can be solved by calculation.First,the AdaBoost method is used as the benchmark classifier to construct the tree structure model.Then,the classification cost of the model is calculated,and the quantitative mathematical description between the cost and features of the ensemble tree classifier model is obtained.Then,the cost of the classification model is transformed into an optimization problem,and the parameters of the integrated tree classifier are given through theoretical derivation.This approach is tested on several highly imbalanced datasets in different fields and takes the AUC(area under the curve)and F-measure as evaluation criteria.Compared with the traditional imbalanced classification algorithm,the ensemble tree classifier has better classification performance. 展开更多
关键词 Ensemble learning F-MEASURE imbalanced classification mathematical model
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部