Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as indust...Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as industrial fault diagnosis,network intrusion detection,cancer detection,etc.In imbalanced classification tasks,the focus is typically on achieving high recognition accuracy for the minority class.However,due to the challenges presented by imbalanced multi-class datasets,such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries,existing methods often do not perform well in multi-class imbalanced data classification tasks,particularly in terms of recognizing minority classes with high accuracy.Therefore,this paper proposes a multi-class imbalanced data classification method called CSDSResNet,which is based on a cost-sensitive dualstream residual network.Firstly,to address the issue of limited samples in the minority class within imbalanced datasets,a dual-stream residual network backbone structure is designed to enhance the model’s feature extraction capability.Next,considering the complexities arising fromimbalanced inter-class sample quantities and imbalanced inter-class overlapping boundaries in multi-class imbalanced datasets,a unique cost-sensitive loss function is devised.This loss function places more emphasis on the minority class and the challenging classes with high interclass similarity,thereby improving the model’s classification ability.Finally,the effectiveness and generalization of the proposed method,CSDSResNet,are evaluated on two datasets:‘DryBeans’and‘Electric Motor Defects’.The experimental results demonstrate that CSDSResNet achieves the best performance on imbalanced datasets,with macro_F1-score values improving by 2.9%and 1.9%on the two datasets compared to current state-of-the-art classification methods,respectively.Furthermore,it achieves the highest precision in single-class recognition tasks for the minority class.展开更多
To deal with the data mining problem of asymmetry misclassification cost, an innovative churn prediction method is proposed based on existing churn prediction research. This method adjusts the misclassification cost b...To deal with the data mining problem of asymmetry misclassification cost, an innovative churn prediction method is proposed based on existing churn prediction research. This method adjusts the misclassification cost based on the C4. 5 decision tree as a baseline classifier, which can obtain the prediction model with a minimum error rate based on the assumption that all misclassifications have the same cost, to realize cost-sensitive learning. Results from customer data of a certain Chinese telecommunication company and the fact that the churners and the non-churners have different misclassification costs demonstrate that by altering the sampling ratio of churners and non-churners, this cost-sensitive learning method can considerably reduce the total misclassification cost produced by traditional classification methods. This method can also play an important role in promoting core competence of Chinese telecommunication industry.展开更多
In many machine learning applications,data are not free,and there is a test cost for each data item. For the economical reason,some existing works try to minimize the test cost and at the same time,preserve a particul...In many machine learning applications,data are not free,and there is a test cost for each data item. For the economical reason,some existing works try to minimize the test cost and at the same time,preserve a particular property of a given decision system. In this paper,we point out that the test cost one can afford is limited in some applications. Hence,one has to sacrifice respective properties to keep the test cost under a budget. To formalize this issue,we define the test cost constraint attribute reduction problem,where the optimization objective is to minimize the conditional information entropy. This problem is an essential generalization of both the test-cost-sensitive attribute reduction problem and the 0-1 knapsack problem,therefore it is more challenging. We propose a heuristic algorithm based on the information gain and test costs to deal with the new problem. The algorithm is tested on four UCI(University of California-Irvine) datasets with various test cost settings. Experimental results indicate the appropriate setting of the only user-specified parameter λ.展开更多
For face detection under complex background and illumination, a detection method that combines the skin color segmentation and cost-sensitive Adaboost algorithm is proposed in this paper. First, by using the character...For face detection under complex background and illumination, a detection method that combines the skin color segmentation and cost-sensitive Adaboost algorithm is proposed in this paper. First, by using the characteristic of human skin color clustering in the color space, the skin color area in YC b C r color space is extracted and a large number of irrelevant backgrounds are excluded; then for remedying the deficiencies of Adaboost algorithm, the cost-sensitive function is introduced into the Adaboost algorithm; finally the skin color segmentation and cost-sensitive Adaboost algorithm are combined for the face detection. Experimental results show that the proposed detection method has a higher detection rate and detection speed, which can more adapt to the actual field environment.展开更多
基金supported by Beijing Municipal Science and Technology Project(No.Z221100007122003)。
文摘Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as industrial fault diagnosis,network intrusion detection,cancer detection,etc.In imbalanced classification tasks,the focus is typically on achieving high recognition accuracy for the minority class.However,due to the challenges presented by imbalanced multi-class datasets,such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries,existing methods often do not perform well in multi-class imbalanced data classification tasks,particularly in terms of recognizing minority classes with high accuracy.Therefore,this paper proposes a multi-class imbalanced data classification method called CSDSResNet,which is based on a cost-sensitive dualstream residual network.Firstly,to address the issue of limited samples in the minority class within imbalanced datasets,a dual-stream residual network backbone structure is designed to enhance the model’s feature extraction capability.Next,considering the complexities arising fromimbalanced inter-class sample quantities and imbalanced inter-class overlapping boundaries in multi-class imbalanced datasets,a unique cost-sensitive loss function is devised.This loss function places more emphasis on the minority class and the challenging classes with high interclass similarity,thereby improving the model’s classification ability.Finally,the effectiveness and generalization of the proposed method,CSDSResNet,are evaluated on two datasets:‘DryBeans’and‘Electric Motor Defects’.The experimental results demonstrate that CSDSResNet achieves the best performance on imbalanced datasets,with macro_F1-score values improving by 2.9%and 1.9%on the two datasets compared to current state-of-the-art classification methods,respectively.Furthermore,it achieves the highest precision in single-class recognition tasks for the minority class.
文摘To deal with the data mining problem of asymmetry misclassification cost, an innovative churn prediction method is proposed based on existing churn prediction research. This method adjusts the misclassification cost based on the C4. 5 decision tree as a baseline classifier, which can obtain the prediction model with a minimum error rate based on the assumption that all misclassifications have the same cost, to realize cost-sensitive learning. Results from customer data of a certain Chinese telecommunication company and the fact that the churners and the non-churners have different misclassification costs demonstrate that by altering the sampling ratio of churners and non-churners, this cost-sensitive learning method can considerably reduce the total misclassification cost produced by traditional classification methods. This method can also play an important role in promoting core competence of Chinese telecommunication industry.
基金supported by the National Natural Science Foundation of China under Grant No. 60873077/F020107
文摘In many machine learning applications,data are not free,and there is a test cost for each data item. For the economical reason,some existing works try to minimize the test cost and at the same time,preserve a particular property of a given decision system. In this paper,we point out that the test cost one can afford is limited in some applications. Hence,one has to sacrifice respective properties to keep the test cost under a budget. To formalize this issue,we define the test cost constraint attribute reduction problem,where the optimization objective is to minimize the conditional information entropy. This problem is an essential generalization of both the test-cost-sensitive attribute reduction problem and the 0-1 knapsack problem,therefore it is more challenging. We propose a heuristic algorithm based on the information gain and test costs to deal with the new problem. The algorithm is tested on four UCI(University of California-Irvine) datasets with various test cost settings. Experimental results indicate the appropriate setting of the only user-specified parameter λ.
基金supported by the National Basic Research Program of China(973 Program)under Grant No.2012CB215202the National Natural Science Foundation of China under Grant No.51205046
文摘For face detection under complex background and illumination, a detection method that combines the skin color segmentation and cost-sensitive Adaboost algorithm is proposed in this paper. First, by using the characteristic of human skin color clustering in the color space, the skin color area in YC b C r color space is extracted and a large number of irrelevant backgrounds are excluded; then for remedying the deficiencies of Adaboost algorithm, the cost-sensitive function is introduced into the Adaboost algorithm; finally the skin color segmentation and cost-sensitive Adaboost algorithm are combined for the face detection. Experimental results show that the proposed detection method has a higher detection rate and detection speed, which can more adapt to the actual field environment.