In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which...In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel under- sampling technique has been successfully applied in searching for the best majority class subset for training a good- performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance.展开更多
Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary cl...Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary classification tasks.However,few efforts have been made to deal with the class overlap problem that accompanies imbalances simultaneously.To this end,this study proposes a Tomek link and genetic algorithm(GA)-based under-sampling framework(TEUS)to address the class imbalance and overlap issues in binary credit classification by eliminating majority class instances with considering multi-perspective factors.TEUS first determines boundary majority instances with Tomek link,then take the distance from each majority instance to its nearest boundary as the radius and assigns the density of opposite class samples within the radius as the overlap potential of that majority instance.Second,TEUS weighs each non-borderline majority instance based on its information contribution in estimating class labels.After partitioning non-borderline majority instances into subgroups according to overlap potential and information contribution,TEUS applies GA to select samples from subgroups and merge them with the minority samples into a new training set.Innovatively,the design of the fitness function in GA and the grouping of the non-borderline majority not only trade off the multi-perspective characteristics of instances but also help reduce the computational complexity of the sampling optimization search.Numerical experiments on real-world credit data sets demonstrate the effectiveness of the proposed TEUS.展开更多
The state-of-the-art approaches for image reconstruction using under-sampled k-space data are compressed sensing based.They are iterative algorithms that optimize objective functions with spatial and/or temporal const...The state-of-the-art approaches for image reconstruction using under-sampled k-space data are compressed sensing based.They are iterative algorithms that optimize objective functions with spatial and/or temporal constraints.This paper proposes a non-iterative algorithm to estimate the un-measured data and then to reconstruct the image with the efficient filtered backprojection algorithm.The feasibility of the proposed method is demonstrated with a patient magnetic resonance imaging study.The proposed method is also compared with the state-of-the-art iterative compressed-sensing image reconstruction method using the total-variation optimization norm.展开更多
A clustering-based undersampling (CUS) and distance-based near-miss method are widely used in current imbalanced learning algorithms, but this method has certain drawbacks. In particular, the CUS does not consider the...A clustering-based undersampling (CUS) and distance-based near-miss method are widely used in current imbalanced learning algorithms, but this method has certain drawbacks. In particular, the CUS does not consider the influence of the distance factor on the majority of instances, and the near-miss method omits the inter-class(es) within the majority of samples. To overcome these drawbacks, this study proposes an undersampling method combining distance measurement and majority class clustering. Resampling methods are used to develop an ensemble-based imbalanced-learning algorithm called the clustering and distance-based imbalance learning model (CDEILM). This algorithm combines distance-based undersampling, feature selection, and ensemble learning. In addition, a cluster size-based resampling (CSBR) method is proposed for preserving the original distribution of the majority class, and a hybrid imbalanced learning framework is constructed by fusing various types of resampling methods. The combination of CDEILM and CSBR can be considered as a specific case of this hybrid framework. The experimental results show that the CDEILM and CSBR methods can achieve better performance than the benchmark methods, and that the hybrid model provides the best results under most circumstances. Therefore, the proposed model can be used as an alternative imbalanced learning method under specific circumstances, e.g., for providing a solution to credit evaluation problems in financial applications.展开更多
Compared with conventional cameras, spectral imagers provide many more features in the spectral do- main. They have been used in various fields such as material identification, remote sensing, precision agriculture, a...Compared with conventional cameras, spectral imagers provide many more features in the spectral do- main. They have been used in various fields such as material identification, remote sensing, precision agriculture, and surveillance. Traditional imaging spectrometers use generally scanning systems. They cannot meet the demands of dynamic scenarios. This limits the practical applications for spectral imaging. Recently, with the rapid development in computational photography theory and semiconductor techniques, spectral video acquisition has become feasible. This paper aims to offer a review of the state-of-the-art spectral imaging technologies, especially those capable of capturing spectral videos. Finally, we evaluate the performances of the existing spectral acquisition systems and discuss the trends for future work.展开更多
Check dams have been widely constructed in the Chinese Loess Plateau and has played an important role in controlling soil loss during last 70 years.However,the large-scale and automatic mapping of the check dams and t...Check dams have been widely constructed in the Chinese Loess Plateau and has played an important role in controlling soil loss during last 70 years.However,the large-scale and automatic mapping of the check dams and the resulting silted fields are lacking.In this study,we present a novel methodological framework to extract silted fields and to estimate the location of the check dams at a pixel level in the Wuding River catchment by remote sensing and ensemble learning models.The random under-sampling method and 23 features were used to train and validate three ensemble learning models,namely Random Forest,Extreme Gradient Boosting and EasyEnsemble,based on a large number of samples.The established optimal model was then applied to the whole study area to map check dams and silted fields.Our results indicate that the imbalance ratio of the samples has a significant impact on the performance of the models.Validation of the results on the testing set show that the F1-score of silted fields of three models is higher than 0.75 at the pixel level.Finally,we produced a map of silted fields and check dams at 10 m-spatial resolution by the optimal model with an accuracy of ca.90%at the object level.The proposed framework can be used for the large-scale and high-precision mapping of check dams and silted fields,which is of great significance for the monitoring and management of the dynamics of check dams and the quantitative evaluation of their eco-environmental benefits.展开更多
基金Acknowledgements We would like to express our gratitude to both the associate editor and the anonymous reviewers for their constructive comments that improved the quality of our manuscript to a large extent. This work was supported by the National Natural Science Foundation of China (Grant No.61501229) and the Fundamental Research Funds for the Central Universities (NS2015091, NS2014067, NJ20160013).
文摘In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel under- sampling technique has been successfully applied in searching for the best majority class subset for training a good- performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance.
基金supported by National Key R&D Programof ChinaunderGrant No.2019YFB1404600Beijing Natural Science Funds under Grant No.9162003Beijing's"High-grade,Precision and Advanced Discipline Construction(Municipal)-Business Administration"project under Grant No.19008022065.
文摘Credit risk assessment is an important task of risk management for financial institutions.Machine learning-based approaches have made promising progress in credit risk assessment by treating it as imbalanced binary classification tasks.However,few efforts have been made to deal with the class overlap problem that accompanies imbalances simultaneously.To this end,this study proposes a Tomek link and genetic algorithm(GA)-based under-sampling framework(TEUS)to address the class imbalance and overlap issues in binary credit classification by eliminating majority class instances with considering multi-perspective factors.TEUS first determines boundary majority instances with Tomek link,then take the distance from each majority instance to its nearest boundary as the radius and assigns the density of opposite class samples within the radius as the overlap potential of that majority instance.Second,TEUS weighs each non-borderline majority instance based on its information contribution in estimating class labels.After partitioning non-borderline majority instances into subgroups according to overlap potential and information contribution,TEUS applies GA to select samples from subgroups and merge them with the minority samples into a new training set.Innovatively,the design of the fitness function in GA and the grouping of the non-borderline majority not only trade off the multi-perspective characteristics of instances but also help reduce the computational complexity of the sampling optimization search.Numerical experiments on real-world credit data sets demonstrate the effectiveness of the proposed TEUS.
基金supported by American Heart Association,No.18AJML34280074.
文摘The state-of-the-art approaches for image reconstruction using under-sampled k-space data are compressed sensing based.They are iterative algorithms that optimize objective functions with spatial and/or temporal constraints.This paper proposes a non-iterative algorithm to estimate the un-measured data and then to reconstruct the image with the efficient filtered backprojection algorithm.The feasibility of the proposed method is demonstrated with a patient magnetic resonance imaging study.The proposed method is also compared with the state-of-the-art iterative compressed-sensing image reconstruction method using the total-variation optimization norm.
文摘A clustering-based undersampling (CUS) and distance-based near-miss method are widely used in current imbalanced learning algorithms, but this method has certain drawbacks. In particular, the CUS does not consider the influence of the distance factor on the majority of instances, and the near-miss method omits the inter-class(es) within the majority of samples. To overcome these drawbacks, this study proposes an undersampling method combining distance measurement and majority class clustering. Resampling methods are used to develop an ensemble-based imbalanced-learning algorithm called the clustering and distance-based imbalance learning model (CDEILM). This algorithm combines distance-based undersampling, feature selection, and ensemble learning. In addition, a cluster size-based resampling (CSBR) method is proposed for preserving the original distribution of the majority class, and a hybrid imbalanced learning framework is constructed by fusing various types of resampling methods. The combination of CDEILM and CSBR can be considered as a specific case of this hybrid framework. The experimental results show that the CDEILM and CSBR methods can achieve better performance than the benchmark methods, and that the hybrid model provides the best results under most circumstances. Therefore, the proposed model can be used as an alternative imbalanced learning method under specific circumstances, e.g., for providing a solution to credit evaluation problems in financial applications.
基金Project supported by the National Natural Science Foundation of China (Nos. 61627804, 61371166, 61422107, 61571215, and 61671236) and the Natural Science Foundation of Jiangsu Province, China (Nos. BK20140610 and BK20160634)
文摘Compared with conventional cameras, spectral imagers provide many more features in the spectral do- main. They have been used in various fields such as material identification, remote sensing, precision agriculture, and surveillance. Traditional imaging spectrometers use generally scanning systems. They cannot meet the demands of dynamic scenarios. This limits the practical applications for spectral imaging. Recently, with the rapid development in computational photography theory and semiconductor techniques, spectral video acquisition has become feasible. This paper aims to offer a review of the state-of-the-art spectral imaging technologies, especially those capable of capturing spectral videos. Finally, we evaluate the performances of the existing spectral acquisition systems and discuss the trends for future work.
基金supported by the National Natural Science Foundation of China(No.41907048)The Fundamental Research Funds for the Central Universities,CHD(No.300102260206)The Shannxi Academy of Forestry(No.SXLK2023-02-15).
文摘Check dams have been widely constructed in the Chinese Loess Plateau and has played an important role in controlling soil loss during last 70 years.However,the large-scale and automatic mapping of the check dams and the resulting silted fields are lacking.In this study,we present a novel methodological framework to extract silted fields and to estimate the location of the check dams at a pixel level in the Wuding River catchment by remote sensing and ensemble learning models.The random under-sampling method and 23 features were used to train and validate three ensemble learning models,namely Random Forest,Extreme Gradient Boosting and EasyEnsemble,based on a large number of samples.The established optimal model was then applied to the whole study area to map check dams and silted fields.Our results indicate that the imbalance ratio of the samples has a significant impact on the performance of the models.Validation of the results on the testing set show that the F1-score of silted fields of three models is higher than 0.75 at the pixel level.Finally,we produced a map of silted fields and check dams at 10 m-spatial resolution by the optimal model with an accuracy of ca.90%at the object level.The proposed framework can be used for the large-scale and high-precision mapping of check dams and silted fields,which is of great significance for the monitoring and management of the dynamics of check dams and the quantitative evaluation of their eco-environmental benefits.