期刊文献+
共找到21篇文章
< 1 2 >
每页显示 20 50 100
Over-sampling algorithm for imbalanced data classification 被引量:9
1
作者 XU Xiaolong CHEN Wen SUN Yanfei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2019年第6期1182-1191,共10页
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic... For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value. 展开更多
关键词 imbalanced data density-based spatial clustering of applications with noise(DBSCAN) synthetic minority over sampling technique(SMOTE) over-sampling.
下载PDF
Close-Loop System Identification Using Over-sampling Scheme and Its Estimate Accuracy Analysis
2
作者 胡怀中 孙连明 刘文江 《Journal of Shanghai University(English Edition)》 CAS 2005年第5期437-444,共8页
A new identification method for a linear discrete-time closed-loop system is proposed based on an output over-sampling scheme. When the system outputs are over-sampled the new output sequences would contain more infor... A new identification method for a linear discrete-time closed-loop system is proposed based on an output over-sampling scheme. When the system outputs are over-sampled the new output sequences would contain more information about the plant structure. Using general least squares method (GLS) the plant over-sampled model should be recognized. Then the original plant model should be obtained by its relationship with the over-sampled model. Compared with conventional approaches the advantage of the new method is that even if the ordinary identifiability conditions are not satisfied, a close-loop system can be identified by using the oversampled output without utilizing any external test signal. Accuracy analysis shows the relationship between the estimation error and the over-sampling rate. Numerical simulation illnstrates its effectiveness. 展开更多
关键词 system identification close-loop over-sampling estimate accuracy.
下载PDF
Predicting Βeta-Turns and Βeta-Turn Types Using a Novel Over-Sampling Approach
3
作者 Lan Anh T. Nguyen Xuan Tho Dang +8 位作者 Tu Kien T. Le Thammakorn Saethang Vu Anh Tran Duc Luu Ngo Sergey Gavrilov Ngoc Giang Nguyen Mamoru Kubo Yoichi Yamada Kenji Satou 《Journal of Biomedical Science and Engineering》 2014年第11期927-940,共14页
β-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset... β-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset, the performance is still inadequate. In this study, we proposed a novel over-sampling technique FOST to deal with the class-imbalance problem. Experimental results on three standard benchmark datasets showed that our method is comparable with state-of-the-art methods. In addition, we applied our algorithm to five benchmark datasets from UCI Machine Learning Repository and achieved significant improvement in G-mean and Sensitivity. It means that our method is also effective for various imbalanced data other than β-turns and β-turn types. 展开更多
关键词 Beta-Turns BETA-TURN TYPES Class-Imbalance over-sampling
下载PDF
A novel over-sampling method and its application to miRNA prediction
4
作者 Xuan Tho Dang Osamu Hirose +6 位作者 Thammakorn Saethang Vu Anh Tran Lan Anh T. Nguyen Tu Kien T. Le Mamoru Kubo Yoichi Yamada Kenji Satou 《Journal of Biomedical Science and Engineering》 2013年第2期236-248,共13页
MicroRNAs (miRNAs) are short (~22nt) non-coding RNAs that play an indispensable role in gene regulation of many biological processes. Most of current computational, comparative, and non-comparative methods commonly cl... MicroRNAs (miRNAs) are short (~22nt) non-coding RNAs that play an indispensable role in gene regulation of many biological processes. Most of current computational, comparative, and non-comparative methods commonly classify human precursor micro- RNA (pre-miRNA) hairpins from both genome pseudo hairpins and other non-coding RNAs (ncRNAs). Although there were a few approaches achieving promising results in applying class imbalance learning methods, this issue has still not solved completely and successfully yet by the existing methods because of imbalanced class distribution in the datasets. For example, SMOTE is a famous and general over-sampling method addressing this problem, however in some cases it cannot improve or sometimes reduces classification performance. Therefore, we developed a novel over-sampling method named incre-mental- SMOTE to distinguish human pre-miRNA hairpins from both genome pseudo hairpins and other ncRNAs. Experimental results on pre-miRNA datasets from Batuwita et al. showed that our method achieved better Sensitivity and G-mean than the control (no over- sampling), SMOTE, and several successsors of modified SMOTE including safe-level-SMOTE and border-line-SMOTE. In addition, we also applied the novel method to five imbalanced benchmark datasets from UCI Machine Learning Repository and achieved improvements in Sensitivity and G-mean. These results suggest that our method outperforms SMOTE and several successors of it in various biomedical classification problems including miRNA classification. 展开更多
关键词 Imbalanced DATASET over-sampling SMOTE MIRNA CLASSIFICATION
下载PDF
An Ensemble Machine Learning Technique for Stroke Prognosis
5
作者 Mesfer Al Duhayyim Sidra Abbas +3 位作者 Abdullah Al Hejaili Natalia Kryvinska Ahmad Almadhor Uzma Ghulam Mohammad 《Computer Systems Science & Engineering》 SCIE EI 2023年第10期413-429,共17页
Stroke is a life-threatening disease usually due to blockage of blood or insufficient blood flow to the brain.It has a tremendous impact on every aspect of life since it is the leading global factor of disability and ... Stroke is a life-threatening disease usually due to blockage of blood or insufficient blood flow to the brain.It has a tremendous impact on every aspect of life since it is the leading global factor of disability and morbidity.Strokes can range from minor to severe(extensive).Thus,early stroke assessment and treatment can enhance survival rates.Manual prediction is extremely time and resource intensive.Automated prediction methods such as Modern Information and Communication Technologies(ICTs),particularly those inMachine Learning(ML)area,are crucial for the early diagnosis and prognosis of stroke.Therefore,this research proposed an ensemble voting model based on three Machine Learning(ML)algorithms:Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Light Gradient Boosting Machine(LGBM).We apply data preprocessing to manage the outliers and useless instances in the dataset.Furthermore,to address the problem of imbalanced data,we enhance the minority class’s representation using the Synthetic Minority Over-Sampling Technique(SMOTE),allowing it to engage in the learning process actively.Results reveal that the suggested model outperforms existing studies and other classifiers with 0.96%accuracy,0.97%precision,0.97%recall,and 0.96%F1-score.The experiment demonstrates that the proposed ensemble voting model outperforms state-of-the-art and other traditional approaches. 展开更多
关键词 Stroke prediction machine learning ensemble model data analysis Synthetic Minority over-sampling
下载PDF
面向不平衡数据集的改进型SMOTE算法 被引量:25
6
作者 王超学 张涛 马春森 《计算机科学与探索》 CSCD 2014年第6期727-734,共8页
针对SMOTE(synthetic minority over-sampling technique)在合成少数类新样本时存在的不足,提出了一种改进的SMOTE算法GA-SMOTE。该算法的关键将是遗传算法中的3个基本算子引入到SMOTE中,利用选择算子实现对少数类样本有区别的选择,使... 针对SMOTE(synthetic minority over-sampling technique)在合成少数类新样本时存在的不足,提出了一种改进的SMOTE算法GA-SMOTE。该算法的关键将是遗传算法中的3个基本算子引入到SMOTE中,利用选择算子实现对少数类样本有区别的选择,使用交叉、变异算子实现对合成样本质量的控制。结合GA-SMOTE与SVM(support vector machine)算法来处理不平衡数据的分类问题。UCI数据集上的大量实验表明,GA-SMOTE在新样本的整体合成效果上表现出色,有效提高了SVM在不平衡数据集上的分类性能。 展开更多
关键词 不平衡数据集 分类 遗传算子 少数类样本合成过采样技术(SMOTE) SYNTHETIC MINORITY over-sampling technique (SMOTE)
下载PDF
一种基于Under-sampling的BGP异常流量检测方法
7
作者 孙红艳 张红玉 《电子技术(上海)》 2011年第1期10-12,共3页
针对BGP数据中两类样本在分布上的非平衡性,本文引入Under-sampling算法对训练数据集进行预处理,结合SVM学习过程,通过改变SVM中训练集的样本分布来消除非平衡分布带来的不良影响。实验结果表明:引入Under-sampling算法,SVM有更好的分... 针对BGP数据中两类样本在分布上的非平衡性,本文引入Under-sampling算法对训练数据集进行预处理,结合SVM学习过程,通过改变SVM中训练集的样本分布来消除非平衡分布带来的不良影响。实验结果表明:引入Under-sampling算法,SVM有更好的分类效果,能更有效地检测出BGP异常流量。 展开更多
关键词 支持向量机 边界网关协议 异常流量检测 under-sampling
原文传递
Evolutionary under-sampling based bagging ensemble method for imbalanced data classification 被引量:11
8
作者 Bo SUN Haiyan CHEN +1 位作者 Jiandong WANG Hua XIE 《Frontiers of Computer Science》 SCIE EI CSCD 2018年第2期331-350,共20页
In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which... In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel under- sampling technique has been successfully applied in searching for the best majority class subset for training a good- performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance. 展开更多
关键词 class imbalanced problem under-sampling BAGGING evolutionary under-sampling ensemble learning machine learning data mining
原文传递
Over-sampling basis expansion model aided channel estimation for OFDM systems with ICI 被引量:3
9
作者 LIU Si-yang LIU Yuan-an WANG Fei-fei XIE Gang ZHANG Ran-ran 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2008年第4期7-13,共7页
The rapid variation of channel can induce the intercarrier interference in orthogonal frequency-division multiplexing (OFDM) systems. Intercarrier interference will significantly increase the difficulty of OFDM chan... The rapid variation of channel can induce the intercarrier interference in orthogonal frequency-division multiplexing (OFDM) systems. Intercarrier interference will significantly increase the difficulty of OFDM channel estimation because too many channel coefficients need be estimated. In this article, a novel channel estimator is proposed to resolve the above problem. This estimator consists of two parts: the channel parameter estimation unit (CPEU), which is used to estimate the number of channel taps and the multipath time delays, and the channel coefficient estimation unit (CCEU), which is used to estimate the channel coefficients by using the estimated channel parameters provided by CPEU. In CCEU, the over-sampling basis expansion model is resorted to solve the problem that a large number of channel coefficients need to be estimated. Finally, simulation results are given to scale the performance of the proposed scheme. 展开更多
关键词 OFDM ICI over-sampling basis expansion model (OBEM)
原文传递
A Hybrid Evolutionary Under-sampling Method for Handling the Class Imbalance Problem with Overlap in Credit Classification
10
作者 Ping Gong Junguang Gao Li Wang 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2022年第6期728-752,共25页
Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary cl... Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary classification tasks.However,few efforts have been made to deal with the class overlap problem that accompanies imbalances simultaneously.To this end,this study proposes a Tomek link and genetic algorithm(GA)-based under-sampling framework(TEUS)to address the class imbalance and overlap issues in binary credit classification by eliminating majority class instances with considering multi-perspective factors.TEUS first determines boundary majority instances with Tomek link,then take the distance from each majority instance to its nearest boundary as the radius and assigns the density of opposite class samples within the radius as the overlap potential of that majority instance.Second,TEUS weighs each non-borderline majority instance based on its information contribution in estimating class labels.After partitioning non-borderline majority instances into subgroups according to overlap potential and information contribution,TEUS applies GA to select samples from subgroups and merge them with the minority samples into a new training set.Innovatively,the design of the fitness function in GA and the grouping of the non-borderline majority not only trade off the multi-perspective characteristics of instances but also help reduce the computational complexity of the sampling optimization search.Numerical experiments on real-world credit data sets demonstrate the effectiveness of the proposed TEUS. 展开更多
关键词 Imbalance classification credit classification class overlap evolutionary under-sampling genetic algorithm
原文传递
Modelling an Efficient Clinical Decision Support System for Heart Disease Prediction Using Learning and Optimization Approaches
11
作者 Sridharan Kannan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2022年第5期677-694,共18页
With the worldwide analysis,heart disease is considered a significant threat and extensively increases the mortality rate.Thus,the investigators mitigate to predict the occurrence of heart disease in an earlier stage ... With the worldwide analysis,heart disease is considered a significant threat and extensively increases the mortality rate.Thus,the investigators mitigate to predict the occurrence of heart disease in an earlier stage using the design of a better Clinical Decision Support System(CDSS).Generally,CDSS is used to predict the individuals’heart disease and periodically update the condition of the patients.This research proposes a novel heart disease prediction system with CDSS composed of a clustering model for noise removal to predict and eliminate outliers.Here,the Synthetic Over-sampling prediction model is integrated with the cluster concept to balance the training data and the Adaboost classifier model is used to predict heart disease.Then,the optimization is achieved using the Adam Optimizer(AO)model with the publicly available dataset known as the Stalog dataset.This flowis used to construct the model,and the evaluation is done with various prevailing approaches like Decision tree,Random Forest,Logistic Regression,Naive Bayes and so on.The statistical analysis is done with theWilcoxon rank-summethod for extracting the p-value of the model.The observed results show that the proposed model outperforms the various existing approaches and attains efficient prediction accuracy.This model helps physicians make better decisions during complex conditions and diagnose the disease at an earlier stage.Thus,the earlier treatment process helps to eliminate the death rate.Here,simulation is done withMATLAB 2016b,and metrics like accuracy,precision-recall,F-measure,p-value,ROC are analyzed to show the significance of the model. 展开更多
关键词 Heart disease clinical decision support system over-sampling AdaBoost classifier adam optimizer Wilcoxon ranking model
下载PDF
Non-iterative image reconstruction from sparse magnetic resonance imaging radial data without priors
12
作者 Gengsheng L.Zeng Edward V.DiBella 《Visual Computing for Industry,Biomedicine,and Art》 2020年第1期84-91,共8页
The state-of-the-art approaches for image reconstruction using under-sampled k-space data are compressed sensing based.They are iterative algorithms that optimize objective functions with spatial and/or temporal const... The state-of-the-art approaches for image reconstruction using under-sampled k-space data are compressed sensing based.They are iterative algorithms that optimize objective functions with spatial and/or temporal constraints.This paper proposes a non-iterative algorithm to estimate the un-measured data and then to reconstruct the image with the efficient filtered backprojection algorithm.The feasibility of the proposed method is demonstrated with a patient magnetic resonance imaging study.The proposed method is also compared with the state-of-the-art iterative compressed-sensing image reconstruction method using the total-variation optimization norm. 展开更多
关键词 Tomographic image reconstruction under-sampled measurements Fast magnetic resonance imaging Analytics reconstruction
下载PDF
An Improved Algorithm for Imbalanced Data and Small Sample Size Classification
13
作者 Yong Hu Dongfa Guo +7 位作者 Zengwei Fan Chen Dong Qiuhong Huang Shengkai Xie Guifang Liu Jing Tan Boping Li Qiwei Xie 《Journal of Data Analysis and Information Processing》 2015年第3期27-33,共7页
Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual s... Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual samples, which are generated by the windowed regression over-sampling (WRO) method. The proposed method WRO not only reflects the additive effects but also reflects the multiplicative effect between samples. A comparative study between the proposed method and other over-sampling methods such as synthetic minority over-sampling technique (SMOTE) and borderline over-sampling (BOS) on UCI datasets and Fourier transform infrared spectroscopy (FTIR) data set is provided. Experimental results show that the WRO method can achieve better performance than other methods. 展开更多
关键词 Class IMBALANCE Learning over-sampling HIGH-DIMENSIONAL Small-Sample SIZE Support VECTOR Machine
下载PDF
Increasing the Resolution and SNR of an ADC′s Measurement with a Method of Over- Sampling and Averaging
14
作者 LI Li 《International Journal of Plant Engineering and Management》 2006年第1期65-68,共4页
By analyzing the theory of over-sampling and averaging, the conclusion is educed that white noise accompanies the signal and the addition of each bit of resolution can be achieved via a fourfold sampling frequency. Th... By analyzing the theory of over-sampling and averaging, the conclusion is educed that white noise accompanies the signal and the addition of each bit of resolution can be achieved via a fourfold sampling frequency. The addition of each bit will approximately increase the SNR (signal to noise ratio) to 6dB. 展开更多
关键词 over-sampling AVERAGING A/D converter(ADC)
下载PDF
Impact of data balancing a multiclass dataset before the creation of association rules to study bacterial vaginosis
15
作者 Freddy de la Cruz-Ruiz Juana Canul-Reich +1 位作者 Rafael Rivera-López Erick de la Cruz-Hernández 《Intelligent Medicine》 EI CSCD 2024年第3期188-199,共12页
Background Bacterial vaginosis is a polymicrobial syndrome in which the homeostasis exerted by the Latobacillus species that protect the vaginal mucosa has been lost.This study explored the data balancing process with... Background Bacterial vaginosis is a polymicrobial syndrome in which the homeostasis exerted by the Latobacillus species that protect the vaginal mucosa has been lost.This study explored the data balancing process with the intention of improving the quality of association rules.The article aimed to balance the unbalanced multiclass dataset to improve association rule creation.Methods A dataset with 201 observations and 58 variables was analyzed.A preconstructed dataset was used.The authors collected the data between August 2016 and October 2018 in Tabasco,Mexico.The study population comprised sexually active women ages 18 to 50 who underwent gynecological inspection at the infectious and metabolic diseases research laboratory at the Universidad Juarez Autonoma de Tabasco.To determine the best κ-value,the random-forest algorithm was used and the balancing was performed with the synthetic minority over-sampling technique(SMOTE),random over-sampling examples(ROSE),and adaptive syntetic sampling approach for imbalanced learning(ADASYN)algorithms.The Apriori algorithm created the rules and to select rules with statistical significance,the is.redundant(),is.significant(),and is.maximal()functions and quality metric Fisher’s exact tes were used.The biological validation was carried out by the expert(bacteriologist).Results The ADASYN algorithm at K=9 the out of the bag(OOB)error was zero,this was the best𝐾-values.In the balancing process the ADASYN algorithm show best the performance.From the dataset balanced with ADASYN,the apriori algorithm created the association rules and the selection with the quality metric Fisher’s exact test,and the biological validation reported 13 rules.Gram-bacteria Atopobium vaginae,Gardnerella vaginalis,Megasphaera filotipo 1,Mycoplasma hominis and Ureaplasma parvum were detected by the apriori algorithm from the balanced dataset.Conclusion Balancing may improve the creation of association rules to efficiently model the bacteria that cause bacterial vaginosis. 展开更多
关键词 Bacterial vaginosis Data balancing Random forest Synthetic minority over-sampling technique
原文传递
Large-scale extraction of check dams and silted fields on the Chinese loess plateau using ensemble learning models
16
作者 Yunfei Li Jianlin Zhao +2 位作者 Ke Yuan Gebeyehu Taye Long Li 《International Soil and Water Conservation Research》 SCIE CSCD 2024年第3期548-564,共17页
Check dams have been widely constructed in the Chinese Loess Plateau and has played an important role in controlling soil loss during last 70 years.However,the large-scale and automatic mapping of the check dams and t... Check dams have been widely constructed in the Chinese Loess Plateau and has played an important role in controlling soil loss during last 70 years.However,the large-scale and automatic mapping of the check dams and the resulting silted fields are lacking.In this study,we present a novel methodological framework to extract silted fields and to estimate the location of the check dams at a pixel level in the Wuding River catchment by remote sensing and ensemble learning models.The random under-sampling method and 23 features were used to train and validate three ensemble learning models,namely Random Forest,Extreme Gradient Boosting and EasyEnsemble,based on a large number of samples.The established optimal model was then applied to the whole study area to map check dams and silted fields.Our results indicate that the imbalance ratio of the samples has a significant impact on the performance of the models.Validation of the results on the testing set show that the F1-score of silted fields of three models is higher than 0.75 at the pixel level.Finally,we produced a map of silted fields and check dams at 10 m-spatial resolution by the optimal model with an accuracy of ca.90%at the object level.The proposed framework can be used for the large-scale and high-precision mapping of check dams and silted fields,which is of great significance for the monitoring and management of the dynamics of check dams and the quantitative evaluation of their eco-environmental benefits. 展开更多
关键词 Sited field Ensemble learning Random under-sampling Imbalanced classification Chinese loess plateau
原文传递
A method for satellite time series anomaly detection based on fast-DTW and improved-KNN 被引量:12
17
作者 Langfu CUI Qingzhen ZHANG +4 位作者 Yan SHI Liman YANG Yixuan WANG Junle WANG Chenggang BAI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第2期149-159,共11页
In satellite anomaly detection,there are some problems such as unbalanced sample distribution,fewer fault samples,and unobvious anomaly characteristics.These problems cause the extisted anomaly detection methods are d... In satellite anomaly detection,there are some problems such as unbalanced sample distribution,fewer fault samples,and unobvious anomaly characteristics.These problems cause the extisted anomaly detection methods are difficult to train accurate classification model,and the accuracy of anomaly detection is hard to improve.At the same time,the monitoring data of satellite has high dimension and is difficult to extract effective features.Based on the DTW over-sampling method,this paper realizes the over-sampling of fault samples in satellite time series,and constructs a distributed and balanced time series data set.The Fast-DTW method is applied to calculate the distance between different time series,which can improve the speed of similarity calculation.KNN(K-Nearest Neighbor)method is applied for classification and the best classification result is obtained by search the optimal hyper-parameters k.The results show that the proposed method has high anomaly detection accuracy and consumes short calculation time. 展开更多
关键词 Anomaly detection Fast-DTW KNN over-sampling SATELLITE Time series
原文传递
Conditional self-attention generative adversarial network with differential evolution algorithm for imbalanced data classification
18
作者 Jiawei NIU Zhunga LIU +2 位作者 Quan PAN Yanbo YANG Yang LI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第3期303-315,共13页
Imbalanced data classification is an important research topic in real-world applications,like fault diagnosis in an aircraft manufacturing system.The over-sampling method is often used to solve this problem.It generat... Imbalanced data classification is an important research topic in real-world applications,like fault diagnosis in an aircraft manufacturing system.The over-sampling method is often used to solve this problem.It generates samples according to the distance between minority data.However,the traditional over-sampling method may change the original data distribution,which is harmful to the classification performance.In this paper,we propose a new method called Conditional SelfAttention Generative Adversarial Network with Differential Evolution(CSAGAN-DE)for imbalanced data classification.The new method aims at improving the classification performance of minority data by enhancing the quality of the generation of minority data.In CSAGAN-DE,the minority data are fed into the self-attention generative adversarial network to approximate the data distribution and create new data for the minority class.Then,the differential evolution algorithm is employed to automatically determine the number of generated minority data for achieving a satisfactory classification performance.Several experiments are conducted to evaluate the performance of the new CSAGAN-DE method.The results show that the new method can efficiently improve the classification performance compared with other related methods. 展开更多
关键词 Classification Generative adversarial network Imbalanced data Optimization over-sampling
原文传递
Improved hybrid resampling and ensemble model for imbalance learning and credit evaluation 被引量:1
19
作者 Gang Kou Hao Chen Mohammed A.Hefni 《Journal of Management Science and Engineering》 2022年第4期511-529,共19页
A clustering-based undersampling (CUS) and distance-based near-miss method are widely used in current imbalanced learning algorithms, but this method has certain drawbacks. In particular, the CUS does not consider the... A clustering-based undersampling (CUS) and distance-based near-miss method are widely used in current imbalanced learning algorithms, but this method has certain drawbacks. In particular, the CUS does not consider the influence of the distance factor on the majority of instances, and the near-miss method omits the inter-class(es) within the majority of samples. To overcome these drawbacks, this study proposes an undersampling method combining distance measurement and majority class clustering. Resampling methods are used to develop an ensemble-based imbalanced-learning algorithm called the clustering and distance-based imbalance learning model (CDEILM). This algorithm combines distance-based undersampling, feature selection, and ensemble learning. In addition, a cluster size-based resampling (CSBR) method is proposed for preserving the original distribution of the majority class, and a hybrid imbalanced learning framework is constructed by fusing various types of resampling methods. The combination of CDEILM and CSBR can be considered as a specific case of this hybrid framework. The experimental results show that the CDEILM and CSBR methods can achieve better performance than the benchmark methods, and that the hybrid model provides the best results under most circumstances. Therefore, the proposed model can be used as an alternative imbalanced learning method under specific circumstances, e.g., for providing a solution to credit evaluation problems in financial applications. 展开更多
关键词 Imbalanced learning Clustering-based under-sampling Ensemble methods Hybrid methods Credit risk evaluation
原文传递
Measure oriented training: a targeted approach to imbalanced classification problems 被引量:1
20
作者 Bo YUAN Wenhuang LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2012年第5期489-497,共9页
Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and bi- ased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various tec... Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and bi- ased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques in- cluding sampling and cost sensitive learning are often em- ployed to improve the performance of classifiers in such sit- uations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between the measure accord- ing to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard three- layer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently fa- vorable outcomes in comparison with a commonly used sam- pling technique. The effectiveness of multi-objective opti- mization in handling imbalanced problems is also demon- strated. 展开更多
关键词 imbalanced datasets genetic algorithms (GAs) neural networks G-mean synthetic minority over-sampling technique (SMOTE)
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部