The study of machine learning has revealed that it can unleash new applications in a variety of disciplines.Many limitations limit their expressiveness,and researchers are working to overcome them to fully exploit the...The study of machine learning has revealed that it can unleash new applications in a variety of disciplines.Many limitations limit their expressiveness,and researchers are working to overcome them to fully exploit the power of data-driven machine learning(ML)and deep learning(DL)techniques.The data imbalance presents major hurdles for classification and prediction problems in machine learning,restricting data analytics and acquiring relevant insights in practically all real-world research domains.In visual learning,network information security,failure prediction,digital marketing,healthcare,and a variety of other domains,raw data suffers from a biased data distribution of one class over the other.This article aims to present a taxonomy of the approaches for handling imbalanced data problems and their comparative study on the classification metrics and their application areas.We have explored very recent trends of techniques employed for solutions to class imbalance problems in datasets and have also discussed their limitations.This article has also identified open challenges for further research in the direction of class data imbalance.展开更多
Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes over...Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes overtime, leading to class imbalance and concept drift issues. Both these issuescause model performance degradation. Most of the current work has beenfocused on developing an ensemble strategy by training a new classifier on thelatest data to resolve the issue. These techniques suffer while training the newclassifier if the data is imbalanced. Also, the class imbalance ratio may changegreatly from one input stream to another, making the problem more complex.The existing solutions proposed for addressing the combined issue of classimbalance and concept drift are lacking in understating of correlation of oneproblem with the other. This work studies the association between conceptdrift and class imbalance ratio and then demonstrates how changes in classimbalance ratio along with concept drift affect the classifier’s performance.We analyzed the effect of both the issues on minority and majority classesindividually. To do this, we conducted experiments on benchmark datasetsusing state-of-the-art classifiers especially designed for data stream classification.Precision, recall, F1 score, and geometric mean were used to measure theperformance. Our findings show that when both class imbalance and conceptdrift problems occur together the performance can decrease up to 15%. Ourresults also show that the increase in the imbalance ratio can cause a 10% to15% decrease in the precision scores of both minority and majority classes.The study findings may help in designing intelligent and adaptive solutionsthat can cope with the challenges of non-stationary data streams like conceptdrift and class imbalance.展开更多
Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this purpose.Com...Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this purpose.Computer-aided diagnosis of pneumonia using deep learning techniques iswidely used due to its effectiveness and performance. In the proposed method,the Synthetic Minority Oversampling Technique (SMOTE) approach is usedto eliminate the class imbalance in the X-ray dataset. To compensate forthe paucity of accessible data, pre-trained transfer learning is used, and anensemble Convolutional Neural Network (CNN) model is developed. Theensemble model consists of all possible combinations of the MobileNetv2,Visual Geometry Group (VGG16), and DenseNet169 models. MobileNetV2and DenseNet169 performed well in the Single classifier model, with anaccuracy of 94%, while the ensemble model (MobileNetV2+DenseNet169)achieved an accuracy of 96.9%. Using the data synchronous parallel modelin Distributed Tensorflow, the training process accelerated performance by98.6% and outperformed other conventional approaches.展开更多
In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algor...In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algorithm is proposed to address the class imbalance problem. The artificial data are created according to the distribution of the training dataset to make the ensemble diverse, and the random subspace re-sampling method is used to reduce the data dimension. In selecting member classifiers based on misclassification cost estimation, the minority class is assigned with higher weights for misclassification costs, while each testing sample has a variable penalty factor to induce the ensemble to correct current error. In our experiments with UCI disease datasets, instead of classification accuracy, F-value and G-means are used as the evaluation rule. Compared with other ensemble methods, our method shows best performance, and needs less labeled samples.展开更多
The popularity of the Internet of Things(IoT)has enabled a large number of vulnerable devices to connect to the Internet,bringing huge security risks.As a network-level security authentication method,device fingerprin...The popularity of the Internet of Things(IoT)has enabled a large number of vulnerable devices to connect to the Internet,bringing huge security risks.As a network-level security authentication method,device fingerprint based on machine learning has attracted considerable attention because it can detect vulnerable devices in complex and heterogeneous access phases.However,flexible and diversified IoT devices with limited resources increase dif-ficulty of the device fingerprint authentication method executed in IoT,because it needs to retrain the model network to deal with incremental features or types.To address this problem,a device fingerprinting mechanism based on a Broad Learning System(BLS)is proposed in this paper.The mechanism firstly characterizes IoT devices by traffic analysis based on the identifiable differences of the traffic data of IoT devices,and extracts feature parameters of the traffic packets.A hierarchical hybrid sampling method is designed at the preprocessing phase to improve the imbalanced data distribution and reconstruct the fingerprint dataset.The complexity of the dataset is reduced using Principal Component Analysis(PCA)and the device type is identified by training weights using BLS.The experimental results show that the proposed method can achieve state-of-the-art accuracy and spend less training time than other existing methods.展开更多
Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challe...Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challenging research problem.Various machine learning techniques are designed to operate on balanced datasets;therefore,the state of the art,different undersampling,over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets,but highly skewed datasets still pose the problem of generalization and noise generation during resampling.To overcome these problems,this paper proposes amajority clusteringmodel for classification of imbalanced datasets known as MCBC-SMOTE(Majority Clustering for balanced Classification-SMOTE).The model provides a method to convert the problem of binary classification into a multi-class problem.In the proposed algorithm,the number of clusters for themajority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution.The proposed technique is cost-effective,reduces the problem of noise generation and successfully disables the imbalances present in between and within classes.The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics.展开更多
With the rise of internet facilities,a greater number of people have started doing online transactions at an exponential rate in recent years as the online transaction system has eliminated the need of going to the ba...With the rise of internet facilities,a greater number of people have started doing online transactions at an exponential rate in recent years as the online transaction system has eliminated the need of going to the bank physically for every transaction.However,the fraud cases have also increased causing the loss of money to the consumers.Hence,an effective fraud detection system is the need of the hour which can detect fraudulent transactions automatically in real-time.Generally,the genuine transactions are large in number than the fraudulent transactions which leads to the class imbalance problem.In this research work,an online transaction fraud detection system using deep learning has been proposed which can handle class imbalance problem by applying algorithm-level methods which modify the learning of the model to focus more on the minority class i.e.,fraud transactions.A novel loss function named Weighted Hard-Reduced Focal Loss(WH-RFL)has been proposed which has achieved maximum fraud detection rate i.e.,True PositiveRate(TPR)at the cost of misclassification of few genuine transactions as high TPR is preferred over a high True Negative Rate(TNR)in fraud detection system and same has been demonstrated using three publicly available imbalanced transactional datasets.Also,Thresholding has been applied to optimize the decision threshold using cross-validation to detect maximum number of frauds and it has been demonstrated by the experimental results that the selection of the right thresholding method with deep learning yields better results.展开更多
Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual s...Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual samples, which are generated by the windowed regression over-sampling (WRO) method. The proposed method WRO not only reflects the additive effects but also reflects the multiplicative effect between samples. A comparative study between the proposed method and other over-sampling methods such as synthetic minority over-sampling technique (SMOTE) and borderline over-sampling (BOS) on UCI datasets and Fourier transform infrared spectroscopy (FTIR) data set is provided. Experimental results show that the WRO method can achieve better performance than other methods.展开更多
In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job ...In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.展开更多
Class imbalance is a common characteristic of industrial data that adversely affects industrial data mining because it leads to the biased training of machine learning models.To address this issue,the augmentation of ...Class imbalance is a common characteristic of industrial data that adversely affects industrial data mining because it leads to the biased training of machine learning models.To address this issue,the augmentation of samples in minority classes based on generative adversarial networks(GANs)has been demonstrated as an effective approach.This study proposes a novel GAN-based minority class augmentation approach named classifier-aided minority augmentation generative adversarial network(CMAGAN).In the CMAGAN framework,an outlier elimination strategy is first applied to each class to minimize the negative impacts of outliers.Subsequently,a newly designed boundary-strengthening learning GAN(BSLGAN)is employed to generate additional samples for minority classes.By incorporating a supplementary classifier and innovative training mechanisms,the BSLGAN focuses on learning the distribution of samples near classification boundaries.Consequently,it can fully capture the characteristics of the target class and generate highly realistic samples with clear boundaries.Finally,the new samples are filtered based on the Mahalanobis distance to ensure that they are within the desired distribution.To evaluate the effectiveness of the proposed approach,CMAGAN was used to solve the class imbalance problem in eight real-world fault-prediction applications.The performance of CMAGAN was compared with that of seven other algorithms,including state-of-the-art GAN-based methods,and the results indicated that CMAGAN could provide higher-quality augmented results.展开更多
数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主...数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS).AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost.M2引入了弱分类器的置信度,此类方法常用于静态数据.定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost.M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能.提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略.将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构.在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法.展开更多
文摘The study of machine learning has revealed that it can unleash new applications in a variety of disciplines.Many limitations limit their expressiveness,and researchers are working to overcome them to fully exploit the power of data-driven machine learning(ML)and deep learning(DL)techniques.The data imbalance presents major hurdles for classification and prediction problems in machine learning,restricting data analytics and acquiring relevant insights in practically all real-world research domains.In visual learning,network information security,failure prediction,digital marketing,healthcare,and a variety of other domains,raw data suffers from a biased data distribution of one class over the other.This article aims to present a taxonomy of the approaches for handling imbalanced data problems and their comparative study on the classification metrics and their application areas.We have explored very recent trends of techniques employed for solutions to class imbalance problems in datasets and have also discussed their limitations.This article has also identified open challenges for further research in the direction of class data imbalance.
基金The authors would like to extend their gratitude to Universiti Teknologi PETRONAS (Malaysia)for funding this research through grant number (015LA0-037).
文摘Every application in a smart city environment like the smart grid,health monitoring, security, and surveillance generates non-stationary datastreams. Due to such nature, the statistical properties of data changes overtime, leading to class imbalance and concept drift issues. Both these issuescause model performance degradation. Most of the current work has beenfocused on developing an ensemble strategy by training a new classifier on thelatest data to resolve the issue. These techniques suffer while training the newclassifier if the data is imbalanced. Also, the class imbalance ratio may changegreatly from one input stream to another, making the problem more complex.The existing solutions proposed for addressing the combined issue of classimbalance and concept drift are lacking in understating of correlation of oneproblem with the other. This work studies the association between conceptdrift and class imbalance ratio and then demonstrates how changes in classimbalance ratio along with concept drift affect the classifier’s performance.We analyzed the effect of both the issues on minority and majority classesindividually. To do this, we conducted experiments on benchmark datasetsusing state-of-the-art classifiers especially designed for data stream classification.Precision, recall, F1 score, and geometric mean were used to measure theperformance. Our findings show that when both class imbalance and conceptdrift problems occur together the performance can decrease up to 15%. Ourresults also show that the increase in the imbalance ratio can cause a 10% to15% decrease in the precision scores of both minority and majority classes.The study findings may help in designing intelligent and adaptive solutionsthat can cope with the challenges of non-stationary data streams like conceptdrift and class imbalance.
文摘Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this purpose.Computer-aided diagnosis of pneumonia using deep learning techniques iswidely used due to its effectiveness and performance. In the proposed method,the Synthetic Minority Oversampling Technique (SMOTE) approach is usedto eliminate the class imbalance in the X-ray dataset. To compensate forthe paucity of accessible data, pre-trained transfer learning is used, and anensemble Convolutional Neural Network (CNN) model is developed. Theensemble model consists of all possible combinations of the MobileNetv2,Visual Geometry Group (VGG16), and DenseNet169 models. MobileNetV2and DenseNet169 performed well in the Single classifier model, with anaccuracy of 94%, while the ensemble model (MobileNetV2+DenseNet169)achieved an accuracy of 96.9%. Using the data synchronous parallel modelin Distributed Tensorflow, the training process accelerated performance by98.6% and outperformed other conventional approaches.
文摘In medical diagnosis, the problem of class imbalance is popular. Though there are abundant unlabeled data, it is very difficult and expensive to get labeled ones. In this paper, an ensemble-based active learning algorithm is proposed to address the class imbalance problem. The artificial data are created according to the distribution of the training dataset to make the ensemble diverse, and the random subspace re-sampling method is used to reduce the data dimension. In selecting member classifiers based on misclassification cost estimation, the minority class is assigned with higher weights for misclassification costs, while each testing sample has a variable penalty factor to induce the ensemble to correct current error. In our experiments with UCI disease datasets, instead of classification accuracy, F-value and G-means are used as the evaluation rule. Compared with other ensemble methods, our method shows best performance, and needs less labeled samples.
基金supported by National Key R&D Program of China(2019YFB2102303)National Natural Science Foundation of China(NSFC61971014,NSFC11675199)Young Backbone Teacher Training Program of Henan Colleges and Universities(2021GGJS170).
文摘The popularity of the Internet of Things(IoT)has enabled a large number of vulnerable devices to connect to the Internet,bringing huge security risks.As a network-level security authentication method,device fingerprint based on machine learning has attracted considerable attention because it can detect vulnerable devices in complex and heterogeneous access phases.However,flexible and diversified IoT devices with limited resources increase dif-ficulty of the device fingerprint authentication method executed in IoT,because it needs to retrain the model network to deal with incremental features or types.To address this problem,a device fingerprinting mechanism based on a Broad Learning System(BLS)is proposed in this paper.The mechanism firstly characterizes IoT devices by traffic analysis based on the identifiable differences of the traffic data of IoT devices,and extracts feature parameters of the traffic packets.A hierarchical hybrid sampling method is designed at the preprocessing phase to improve the imbalanced data distribution and reconstruct the fingerprint dataset.The complexity of the dataset is reduced using Principal Component Analysis(PCA)and the device type is identified by training weights using BLS.The experimental results show that the proposed method can achieve state-of-the-art accuracy and spend less training time than other existing methods.
基金This research was supported by Taif University Researchers Supporting Project number(TURSP-2020/254),Taif University,Taif,Saudi Arabia.
文摘Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challenging research problem.Various machine learning techniques are designed to operate on balanced datasets;therefore,the state of the art,different undersampling,over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets,but highly skewed datasets still pose the problem of generalization and noise generation during resampling.To overcome these problems,this paper proposes amajority clusteringmodel for classification of imbalanced datasets known as MCBC-SMOTE(Majority Clustering for balanced Classification-SMOTE).The model provides a method to convert the problem of binary classification into a multi-class problem.In the proposed algorithm,the number of clusters for themajority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution.The proposed technique is cost-effective,reduces the problem of noise generation and successfully disables the imbalances present in between and within classes.The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics.
基金This research was supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0012724,The Competency Development Program for Industry Specialist)and the Soonchunhyang University Research Fund.
文摘With the rise of internet facilities,a greater number of people have started doing online transactions at an exponential rate in recent years as the online transaction system has eliminated the need of going to the bank physically for every transaction.However,the fraud cases have also increased causing the loss of money to the consumers.Hence,an effective fraud detection system is the need of the hour which can detect fraudulent transactions automatically in real-time.Generally,the genuine transactions are large in number than the fraudulent transactions which leads to the class imbalance problem.In this research work,an online transaction fraud detection system using deep learning has been proposed which can handle class imbalance problem by applying algorithm-level methods which modify the learning of the model to focus more on the minority class i.e.,fraud transactions.A novel loss function named Weighted Hard-Reduced Focal Loss(WH-RFL)has been proposed which has achieved maximum fraud detection rate i.e.,True PositiveRate(TPR)at the cost of misclassification of few genuine transactions as high TPR is preferred over a high True Negative Rate(TNR)in fraud detection system and same has been demonstrated using three publicly available imbalanced transactional datasets.Also,Thresholding has been applied to optimize the decision threshold using cross-validation to detect maximum number of frauds and it has been demonstrated by the experimental results that the selection of the right thresholding method with deep learning yields better results.
文摘Traditional classification algorithms perform not very well on imbalanced data sets and small sample size. To deal with the problem, a novel method is proposed to change the class distribution through adding virtual samples, which are generated by the windowed regression over-sampling (WRO) method. The proposed method WRO not only reflects the additive effects but also reflects the multiplicative effect between samples. A comparative study between the proposed method and other over-sampling methods such as synthetic minority over-sampling technique (SMOTE) and borderline over-sampling (BOS) on UCI datasets and Fourier transform infrared spectroscopy (FTIR) data set is provided. Experimental results show that the WRO method can achieve better performance than other methods.
文摘In recent years,the detection of fake job descriptions has become increasingly necessary because social networking has changed the way people access burgeoning information in the internet age.Identifying fraud in job descriptions can help jobseekers to avoid many of the risks of job hunting.However,the problem of detecting fake job descriptions comes up against the problem of class imbalance when the number of genuine jobs exceeds the number of fake jobs.This causes a reduction in the predictability and performance of traditional machine learning models.We therefore present an efficient framework that uses an oversampling technique called FJD-OT(Fake Job Description Detection Using Oversampling Techniques)to improve the predictability of detecting fake job descriptions.In the proposed framework,we apply several techniques including the removal of stop words and the use of a tokenizer to preprocess the text data in the first module.We then use a bag of words in combination with the term frequency-inverse document frequency(TF-IDF)approach to extract the features from the text data to create the feature dataset in the second module.Next,our framework applies k-fold cross-validation,a commonly used technique to test the effectiveness of machine learning models,that splits the experimental dataset[the Employment Scam Aegean(ESA)dataset in our study]into training and test sets for evaluation.The training set is passed through the third module,an oversampling module in which the SVMSMOTE method is used to balance data before training the classifiers in the last module.The experimental results indicate that the proposed approach significantly improves the predictability of fake job description detection on the ESA dataset based on several popular performance metrics.
基金supported by the National Natural Science Foundation of China(Grant No.52375256)the Natural Science Foundation of Shanghai Municipality(Grant Nos.21ZR1431500 and 23ZR1431600).
文摘Class imbalance is a common characteristic of industrial data that adversely affects industrial data mining because it leads to the biased training of machine learning models.To address this issue,the augmentation of samples in minority classes based on generative adversarial networks(GANs)has been demonstrated as an effective approach.This study proposes a novel GAN-based minority class augmentation approach named classifier-aided minority augmentation generative adversarial network(CMAGAN).In the CMAGAN framework,an outlier elimination strategy is first applied to each class to minimize the negative impacts of outliers.Subsequently,a newly designed boundary-strengthening learning GAN(BSLGAN)is employed to generate additional samples for minority classes.By incorporating a supplementary classifier and innovative training mechanisms,the BSLGAN focuses on learning the distribution of samples near classification boundaries.Consequently,it can fully capture the characteristics of the target class and generate highly realistic samples with clear boundaries.Finally,the new samples are filtered based on the Mahalanobis distance to ensure that they are within the desired distribution.To evaluate the effectiveness of the proposed approach,CMAGAN was used to solve the class imbalance problem in eight real-world fault-prediction applications.The performance of CMAGAN was compared with that of seven other algorithms,including state-of-the-art GAN-based methods,and the results indicated that CMAGAN could provide higher-quality augmented results.
文摘数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS).AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost.M2引入了弱分类器的置信度,此类方法常用于静态数据.定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost.M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能.提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略.将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构.在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法.