期刊文献+
共找到13篇文章
< 1 >
每页显示 20 50 100
Over-sampling algorithm for imbalanced data classification 被引量:8
1
作者 XU Xiaolong CHEN Wen SUN Yanfei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2019年第6期1182-1191,共10页
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic... For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value. 展开更多
关键词 imbalanced data density-based spatial clustering of applications with noise(DBSCAN) synthetic minority over sampling technique(SMOTE) over-sampling.
下载PDF
Close-Loop System Identification Using Over-sampling Scheme and Its Estimate Accuracy Analysis
2
作者 胡怀中 孙连明 刘文江 《Journal of Shanghai University(English Edition)》 CAS 2005年第5期437-444,共8页
A new identification method for a linear discrete-time closed-loop system is proposed based on an output over-sampling scheme. When the system outputs are over-sampled the new output sequences would contain more infor... A new identification method for a linear discrete-time closed-loop system is proposed based on an output over-sampling scheme. When the system outputs are over-sampled the new output sequences would contain more information about the plant structure. Using general least squares method (GLS) the plant over-sampled model should be recognized. Then the original plant model should be obtained by its relationship with the over-sampled model. Compared with conventional approaches the advantage of the new method is that even if the ordinary identifiability conditions are not satisfied, a close-loop system can be identified by using the oversampled output without utilizing any external test signal. Accuracy analysis shows the relationship between the estimation error and the over-sampling rate. Numerical simulation illnstrates its effectiveness. 展开更多
关键词 system identification close-loop over-samplING estimate accuracy.
下载PDF
A novel over-sampling method and its application to miRNA prediction
3
作者 Xuan Tho Dang Osamu Hirose +6 位作者 Thammakorn Saethang Vu Anh Tran Lan Anh T. Nguyen Tu Kien T. Le Mamoru Kubo Yoichi Yamada Kenji Satou 《Journal of Biomedical Science and Engineering》 2013年第2期236-248,共13页
MicroRNAs (miRNAs) are short (~22nt) non-coding RNAs that play an indispensable role in gene regulation of many biological processes. Most of current computational, comparative, and non-comparative methods commonly cl... MicroRNAs (miRNAs) are short (~22nt) non-coding RNAs that play an indispensable role in gene regulation of many biological processes. Most of current computational, comparative, and non-comparative methods commonly classify human precursor micro- RNA (pre-miRNA) hairpins from both genome pseudo hairpins and other non-coding RNAs (ncRNAs). Although there were a few approaches achieving promising results in applying class imbalance learning methods, this issue has still not solved completely and successfully yet by the existing methods because of imbalanced class distribution in the datasets. For example, SMOTE is a famous and general over-sampling method addressing this problem, however in some cases it cannot improve or sometimes reduces classification performance. Therefore, we developed a novel over-sampling method named incre-mental- SMOTE to distinguish human pre-miRNA hairpins from both genome pseudo hairpins and other ncRNAs. Experimental results on pre-miRNA datasets from Batuwita et al. showed that our method achieved better Sensitivity and G-mean than the control (no over- sampling), SMOTE, and several successsors of modified SMOTE including safe-level-SMOTE and border-line-SMOTE. In addition, we also applied the novel method to five imbalanced benchmark datasets from UCI Machine Learning Repository and achieved improvements in Sensitivity and G-mean. These results suggest that our method outperforms SMOTE and several successors of it in various biomedical classification problems including miRNA classification. 展开更多
关键词 Imbalanced DATASET over-samplING SMOTE MIRNA CLASSIFICATION
下载PDF
Predicting Βeta-Turns and Βeta-Turn Types Using a Novel Over-Sampling Approach
4
作者 Lan Anh T. Nguyen Xuan Tho Dang +8 位作者 Tu Kien T. Le Thammakorn Saethang Vu Anh Tran Duc Luu Ngo Sergey Gavrilov Ngoc Giang Nguyen Mamoru Kubo Yoichi Yamada Kenji Satou 《Journal of Biomedical Science and Engineering》 2014年第11期927-940,共14页
β-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset... β-turn is one of the most important reverse turns because of its role in protein folding. Many computational methods have been studied for predicting β-turns and β-turn types. However, due to the imbalanced dataset, the performance is still inadequate. In this study, we proposed a novel over-sampling technique FOST to deal with the class-imbalance problem. Experimental results on three standard benchmark datasets showed that our method is comparable with state-of-the-art methods. In addition, we applied our algorithm to five benchmark datasets from UCI Machine Learning Repository and achieved significant improvement in G-mean and Sensitivity. It means that our method is also effective for various imbalanced data other than β-turns and β-turn types. 展开更多
关键词 Beta-Turns BETA-TURN TYPES Class-Imbalance over-samplING
下载PDF
An Ensemble Machine Learning Technique for Stroke Prognosis
5
作者 Mesfer Al Duhayyim Sidra Abbas +3 位作者 Abdullah Al Hejaili Natalia Kryvinska Ahmad Almadhor Uzma Ghulam Mohammad 《Computer Systems Science & Engineering》 SCIE EI 2023年第10期413-429,共17页
Stroke is a life-threatening disease usually due to blockage of blood or insufficient blood flow to the brain.It has a tremendous impact on every aspect of life since it is the leading global factor of disability and ... Stroke is a life-threatening disease usually due to blockage of blood or insufficient blood flow to the brain.It has a tremendous impact on every aspect of life since it is the leading global factor of disability and morbidity.Strokes can range from minor to severe(extensive).Thus,early stroke assessment and treatment can enhance survival rates.Manual prediction is extremely time and resource intensive.Automated prediction methods such as Modern Information and Communication Technologies(ICTs),particularly those inMachine Learning(ML)area,are crucial for the early diagnosis and prognosis of stroke.Therefore,this research proposed an ensemble voting model based on three Machine Learning(ML)algorithms:Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Light Gradient Boosting Machine(LGBM).We apply data preprocessing to manage the outliers and useless instances in the dataset.Furthermore,to address the problem of imbalanced data,we enhance the minority class’s representation using the Synthetic Minority Over-Sampling Technique(SMOTE),allowing it to engage in the learning process actively.Results reveal that the suggested model outperforms existing studies and other classifiers with 0.96%accuracy,0.97%precision,0.97%recall,and 0.96%F1-score.The experiment demonstrates that the proposed ensemble voting model outperforms state-of-the-art and other traditional approaches. 展开更多
关键词 Stroke prediction machine learning ensemble model data analysis Synthetic Minority over-sampling
下载PDF
面向不平衡数据集的改进型SMOTE算法 被引量:24
6
作者 王超学 张涛 马春森 《计算机科学与探索》 CSCD 2014年第6期727-734,共8页
针对SMOTE(synthetic minority over-sampling technique)在合成少数类新样本时存在的不足,提出了一种改进的SMOTE算法GA-SMOTE。该算法的关键将是遗传算法中的3个基本算子引入到SMOTE中,利用选择算子实现对少数类样本有区别的选择,使... 针对SMOTE(synthetic minority over-sampling technique)在合成少数类新样本时存在的不足,提出了一种改进的SMOTE算法GA-SMOTE。该算法的关键将是遗传算法中的3个基本算子引入到SMOTE中,利用选择算子实现对少数类样本有区别的选择,使用交叉、变异算子实现对合成样本质量的控制。结合GA-SMOTE与SVM(support vector machine)算法来处理不平衡数据的分类问题。UCI数据集上的大量实验表明,GA-SMOTE在新样本的整体合成效果上表现出色,有效提高了SVM在不平衡数据集上的分类性能。 展开更多
关键词 不平衡数据集 分类 遗传算子 少数类样本合成过采样技术(SMOTE) SYNTHETIC MINORITY over-samplING technique (SMOTE)
下载PDF
Modelling an Efficient Clinical Decision Support System for Heart Disease Prediction Using Learning and Optimization Approaches
7
作者 Sridharan Kannan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2022年第5期677-694,共18页
With the worldwide analysis,heart disease is considered a significant threat and extensively increases the mortality rate.Thus,the investigators mitigate to predict the occurrence of heart disease in an earlier stage ... With the worldwide analysis,heart disease is considered a significant threat and extensively increases the mortality rate.Thus,the investigators mitigate to predict the occurrence of heart disease in an earlier stage using the design of a better Clinical Decision Support System(CDSS).Generally,CDSS is used to predict the individuals’heart disease and periodically update the condition of the patients.This research proposes a novel heart disease prediction system with CDSS composed of a clustering model for noise removal to predict and eliminate outliers.Here,the Synthetic Over-sampling prediction model is integrated with the cluster concept to balance the training data and the Adaboost classifier model is used to predict heart disease.Then,the optimization is achieved using the Adam Optimizer(AO)model with the publicly available dataset known as the Stalog dataset.This flowis used to construct the model,and the evaluation is done with various prevailing approaches like Decision tree,Random Forest,Logistic Regression,Naive Bayes and so on.The statistical analysis is done with theWilcoxon rank-summethod for extracting the p-value of the model.The observed results show that the proposed model outperforms the various existing approaches and attains efficient prediction accuracy.This model helps physicians make better decisions during complex conditions and diagnose the disease at an earlier stage.Thus,the earlier treatment process helps to eliminate the death rate.Here,simulation is done withMATLAB 2016b,and metrics like accuracy,precision-recall,F-measure,p-value,ROC are analyzed to show the significance of the model. 展开更多
关键词 Heart disease clinical decision support system over-samplING AdaBoost classifier adam optimizer Wilcoxon ranking model
下载PDF
An Improved Algorithm for Imbalanced Data and Small Sample Size Classification
8
作者 Yong Hu Dongfa Guo +7 位作者 Zengwei Fan Chen Dong Qiuhong Huang Shengkai Xie Guifang Liu Jing Tan Boping Li Qiwei Xie 《Journal of Data Analysis and Information Processing》 2015年第3期27-33,共7页
Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual s... Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual samples, which are generated by the windowed regression over-sampling (WRO) method. The proposed method WRO not only reflects the additive effects but also reflects the multiplicative effect between samples. A comparative study between the proposed method and other over-sampling methods such as synthetic minority over-sampling technique (SMOTE) and borderline over-sampling (BOS) on UCI datasets and Fourier transform infrared spectroscopy (FTIR) data set is provided. Experimental results show that the WRO method can achieve better performance than other methods. 展开更多
关键词 Class IMBALANCE Learning over-samplING HIGH-DIMENSIONAL Small-Sample SIZE Support VECTOR Machine
下载PDF
Increasing the Resolution and SNR of an ADC′s Measurement with a Method of Over- Sampling and Averaging
9
作者 LI Li 《International Journal of Plant Engineering and Management》 2006年第1期65-68,共4页
By analyzing the theory of over-sampling and averaging, the conclusion is educed that white noise accompanies the signal and the addition of each bit of resolution can be achieved via a fourfold sampling frequency. Th... By analyzing the theory of over-sampling and averaging, the conclusion is educed that white noise accompanies the signal and the addition of each bit of resolution can be achieved via a fourfold sampling frequency. The addition of each bit will approximately increase the SNR (signal to noise ratio) to 6dB. 展开更多
关键词 over-samplING AVERAGING A/D converter(ADC)
下载PDF
Over-sampling basis expansion model aided channel estimation for OFDM systems with ICI 被引量:3
10
作者 LIU Si-yang LIU Yuan-an WANG Fei-fei XIE Gang ZHANG Ran-ran 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2008年第4期7-13,共7页
The rapid variation of channel can induce the intercarrier interference in orthogonal frequency-division multiplexing (OFDM) systems. Intercarrier interference will significantly increase the difficulty of OFDM chan... The rapid variation of channel can induce the intercarrier interference in orthogonal frequency-division multiplexing (OFDM) systems. Intercarrier interference will significantly increase the difficulty of OFDM channel estimation because too many channel coefficients need be estimated. In this article, a novel channel estimator is proposed to resolve the above problem. This estimator consists of two parts: the channel parameter estimation unit (CPEU), which is used to estimate the number of channel taps and the multipath time delays, and the channel coefficient estimation unit (CCEU), which is used to estimate the channel coefficients by using the estimated channel parameters provided by CPEU. In CCEU, the over-sampling basis expansion model is resorted to solve the problem that a large number of channel coefficients need to be estimated. Finally, simulation results are given to scale the performance of the proposed scheme. 展开更多
关键词 OFDM ICI over-sampling basis expansion model (OBEM)
原文传递
A method for satellite time series anomaly detection based on fast-DTW and improved-KNN 被引量:10
11
作者 Langfu CUI Qingzhen ZHANG +4 位作者 Yan SHI Liman YANG Yixuan WANG Junle WANG Chenggang BAI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第2期149-159,共11页
In satellite anomaly detection,there are some problems such as unbalanced sample distribution,fewer fault samples,and unobvious anomaly characteristics.These problems cause the extisted anomaly detection methods are d... In satellite anomaly detection,there are some problems such as unbalanced sample distribution,fewer fault samples,and unobvious anomaly characteristics.These problems cause the extisted anomaly detection methods are difficult to train accurate classification model,and the accuracy of anomaly detection is hard to improve.At the same time,the monitoring data of satellite has high dimension and is difficult to extract effective features.Based on the DTW over-sampling method,this paper realizes the over-sampling of fault samples in satellite time series,and constructs a distributed and balanced time series data set.The Fast-DTW method is applied to calculate the distance between different time series,which can improve the speed of similarity calculation.KNN(K-Nearest Neighbor)method is applied for classification and the best classification result is obtained by search the optimal hyper-parameters k.The results show that the proposed method has high anomaly detection accuracy and consumes short calculation time. 展开更多
关键词 Anomaly detection Fast-DTW KNN over-samplING SATELLITE Time series
原文传递
Conditional self-attention generative adversarial network with differential evolution algorithm for imbalanced data classification
12
作者 Jiawei NIU Zhunga LIU +2 位作者 Quan PAN Yanbo YANG Yang LI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第3期303-315,共13页
Imbalanced data classification is an important research topic in real-world applications,like fault diagnosis in an aircraft manufacturing system.The over-sampling method is often used to solve this problem.It generat... Imbalanced data classification is an important research topic in real-world applications,like fault diagnosis in an aircraft manufacturing system.The over-sampling method is often used to solve this problem.It generates samples according to the distance between minority data.However,the traditional over-sampling method may change the original data distribution,which is harmful to the classification performance.In this paper,we propose a new method called Conditional SelfAttention Generative Adversarial Network with Differential Evolution(CSAGAN-DE)for imbalanced data classification.The new method aims at improving the classification performance of minority data by enhancing the quality of the generation of minority data.In CSAGAN-DE,the minority data are fed into the self-attention generative adversarial network to approximate the data distribution and create new data for the minority class.Then,the differential evolution algorithm is employed to automatically determine the number of generated minority data for achieving a satisfactory classification performance.Several experiments are conducted to evaluate the performance of the new CSAGAN-DE method.The results show that the new method can efficiently improve the classification performance compared with other related methods. 展开更多
关键词 Classification Generative adversarial network Imbalanced data Optimization over-samplING
原文传递
Measure oriented training: a targeted approach to imbalanced classification problems 被引量:1
13
作者 Bo YUAN Wenhuang LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2012年第5期489-497,共9页
Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and bi- ased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various tec... Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and bi- ased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques in- cluding sampling and cost sensitive learning are often em- ployed to improve the performance of classifiers in such sit- uations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between the measure accord- ing to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard three- layer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently fa- vorable outcomes in comparison with a commonly used sam- pling technique. The effectiveness of multi-objective opti- mization in handling imbalanced problems is also demon- strated. 展开更多
关键词 imbalanced datasets genetic algorithms (GAs) neural networks G-mean synthetic minority over-sampling technique (SMOTE)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部