Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as indust...Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as industrial fault diagnosis,network intrusion detection,cancer detection,etc.In imbalanced classification tasks,the focus is typically on achieving high recognition accuracy for the minority class.However,due to the challenges presented by imbalanced multi-class datasets,such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries,existing methods often do not perform well in multi-class imbalanced data classification tasks,particularly in terms of recognizing minority classes with high accuracy.Therefore,this paper proposes a multi-class imbalanced data classification method called CSDSResNet,which is based on a cost-sensitive dualstream residual network.Firstly,to address the issue of limited samples in the minority class within imbalanced datasets,a dual-stream residual network backbone structure is designed to enhance the model’s feature extraction capability.Next,considering the complexities arising fromimbalanced inter-class sample quantities and imbalanced inter-class overlapping boundaries in multi-class imbalanced datasets,a unique cost-sensitive loss function is devised.This loss function places more emphasis on the minority class and the challenging classes with high interclass similarity,thereby improving the model’s classification ability.Finally,the effectiveness and generalization of the proposed method,CSDSResNet,are evaluated on two datasets:‘DryBeans’and‘Electric Motor Defects’.The experimental results demonstrate that CSDSResNet achieves the best performance on imbalanced datasets,with macro_F1-score values improving by 2.9%and 1.9%on the two datasets compared to current state-of-the-art classification methods,respectively.Furthermore,it achieves the highest precision in single-class recognition tasks for the minority class.展开更多
To deal with the data mining problem of asymmetry misclassification cost, an innovative churn prediction method is proposed based on existing churn prediction research. This method adjusts the misclassification cost b...To deal with the data mining problem of asymmetry misclassification cost, an innovative churn prediction method is proposed based on existing churn prediction research. This method adjusts the misclassification cost based on the C4. 5 decision tree as a baseline classifier, which can obtain the prediction model with a minimum error rate based on the assumption that all misclassifications have the same cost, to realize cost-sensitive learning. Results from customer data of a certain Chinese telecommunication company and the fact that the churners and the non-churners have different misclassification costs demonstrate that by altering the sampling ratio of churners and non-churners, this cost-sensitive learning method can considerably reduce the total misclassification cost produced by traditional classification methods. This method can also play an important role in promoting core competence of Chinese telecommunication industry.展开更多
As a joint-optimization problem which simultaneously fulfills two different but correlated embedding tasks (i.e., entity embedding and relation embedding), knowledge embedding problem is solved in a joint embedding ...As a joint-optimization problem which simultaneously fulfills two different but correlated embedding tasks (i.e., entity embedding and relation embedding), knowledge embedding problem is solved in a joint embedding scheme. In this embedding scheme, we design a joint compatibility scoring function to quantitatively evaluate the relational facts with respect to entities and relations, and further incorporate the scoring function into the maxmargin structure learning process that explicitly learns the embedding vectors of entities and relations using the context information of the knowledge base. By optimizing the joint problem, our design is capable of effectively capturing the intrinsic topological structures in the learned embedding spaces. Experimental results demonstrate the effectiveness of our embedding scheme in characterizing the semantic correlations among different relation units, and in relation prediction for knowledge inference.展开更多
For face detection under complex background and illumination, a detection method that combines the skin color segmentation and cost-sensitive Adaboost algorithm is proposed in this paper. First, by using the character...For face detection under complex background and illumination, a detection method that combines the skin color segmentation and cost-sensitive Adaboost algorithm is proposed in this paper. First, by using the characteristic of human skin color clustering in the color space, the skin color area in YC b C r color space is extracted and a large number of irrelevant backgrounds are excluded; then for remedying the deficiencies of Adaboost algorithm, the cost-sensitive function is introduced into the Adaboost algorithm; finally the skin color segmentation and cost-sensitive Adaboost algorithm are combined for the face detection. Experimental results show that the proposed detection method has a higher detection rate and detection speed, which can more adapt to the actual field environment.展开更多
In many machine learning applications,data are not free,and there is a test cost for each data item. For the economical reason,some existing works try to minimize the test cost and at the same time,preserve a particul...In many machine learning applications,data are not free,and there is a test cost for each data item. For the economical reason,some existing works try to minimize the test cost and at the same time,preserve a particular property of a given decision system. In this paper,we point out that the test cost one can afford is limited in some applications. Hence,one has to sacrifice respective properties to keep the test cost under a budget. To formalize this issue,we define the test cost constraint attribute reduction problem,where the optimization objective is to minimize the conditional information entropy. This problem is an essential generalization of both the test-cost-sensitive attribute reduction problem and the 0-1 knapsack problem,therefore it is more challenging. We propose a heuristic algorithm based on the information gain and test costs to deal with the new problem. The algorithm is tested on four UCI(University of California-Irvine) datasets with various test cost settings. Experimental results indicate the appropriate setting of the only user-specified parameter λ.展开更多
To secure power system operations,practical dispatches in industries place a steady power transfer limit on critical inter-corridors,rather than high-dimensional and strong nonlinear stability constraints.However,comp...To secure power system operations,practical dispatches in industries place a steady power transfer limit on critical inter-corridors,rather than high-dimensional and strong nonlinear stability constraints.However,computational complexities lead to over-conservative pre-settings of transfer limit,which further induce undesirable and non-technical congestion of power transfer.To conquer this barrier,a scenario-classification hybrid-based banding method is proposed.A cluster technique is adopted to separate similarities from historical and generated operating condition dataset.With a practical rule,transfer limits are approximated for each operating cluster.Then,toward an interpretable online transfer limit decision,costsensitive learning is applied to identify cluster affiliation to assign a transfer limit for a given operation.In this stage,critical variables that affect the transfer limit are also picked out via mean impact value.This enables us to construct low-complexity and dispatcher-friendly rules for fast determination of transfer limit.The numerical case studies on the IEEE 39-bus system and a real-world regional power system in China illustrate the effectiveness and conservativeness of the proposed method.展开更多
基金supported by Beijing Municipal Science and Technology Project(No.Z221100007122003)。
文摘Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as industrial fault diagnosis,network intrusion detection,cancer detection,etc.In imbalanced classification tasks,the focus is typically on achieving high recognition accuracy for the minority class.However,due to the challenges presented by imbalanced multi-class datasets,such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries,existing methods often do not perform well in multi-class imbalanced data classification tasks,particularly in terms of recognizing minority classes with high accuracy.Therefore,this paper proposes a multi-class imbalanced data classification method called CSDSResNet,which is based on a cost-sensitive dualstream residual network.Firstly,to address the issue of limited samples in the minority class within imbalanced datasets,a dual-stream residual network backbone structure is designed to enhance the model’s feature extraction capability.Next,considering the complexities arising fromimbalanced inter-class sample quantities and imbalanced inter-class overlapping boundaries in multi-class imbalanced datasets,a unique cost-sensitive loss function is devised.This loss function places more emphasis on the minority class and the challenging classes with high interclass similarity,thereby improving the model’s classification ability.Finally,the effectiveness and generalization of the proposed method,CSDSResNet,are evaluated on two datasets:‘DryBeans’and‘Electric Motor Defects’.The experimental results demonstrate that CSDSResNet achieves the best performance on imbalanced datasets,with macro_F1-score values improving by 2.9%and 1.9%on the two datasets compared to current state-of-the-art classification methods,respectively.Furthermore,it achieves the highest precision in single-class recognition tasks for the minority class.
文摘To deal with the data mining problem of asymmetry misclassification cost, an innovative churn prediction method is proposed based on existing churn prediction research. This method adjusts the misclassification cost based on the C4. 5 decision tree as a baseline classifier, which can obtain the prediction model with a minimum error rate based on the assumption that all misclassifications have the same cost, to realize cost-sensitive learning. Results from customer data of a certain Chinese telecommunication company and the fact that the churners and the non-churners have different misclassification costs demonstrate that by altering the sampling ratio of churners and non-churners, this cost-sensitive learning method can considerably reduce the total misclassification cost produced by traditional classification methods. This method can also play an important role in promoting core competence of Chinese telecommunication industry.
基金Project supported by the National Basic Research Program (973) of China (No. 2015CB352302) and the National Natural Science Foundation of China (Nos. U1509206 and 61472353)
文摘As a joint-optimization problem which simultaneously fulfills two different but correlated embedding tasks (i.e., entity embedding and relation embedding), knowledge embedding problem is solved in a joint embedding scheme. In this embedding scheme, we design a joint compatibility scoring function to quantitatively evaluate the relational facts with respect to entities and relations, and further incorporate the scoring function into the maxmargin structure learning process that explicitly learns the embedding vectors of entities and relations using the context information of the knowledge base. By optimizing the joint problem, our design is capable of effectively capturing the intrinsic topological structures in the learned embedding spaces. Experimental results demonstrate the effectiveness of our embedding scheme in characterizing the semantic correlations among different relation units, and in relation prediction for knowledge inference.
基金supported by the National Basic Research Program of China(973 Program)under Grant No.2012CB215202the National Natural Science Foundation of China under Grant No.51205046
文摘For face detection under complex background and illumination, a detection method that combines the skin color segmentation and cost-sensitive Adaboost algorithm is proposed in this paper. First, by using the characteristic of human skin color clustering in the color space, the skin color area in YC b C r color space is extracted and a large number of irrelevant backgrounds are excluded; then for remedying the deficiencies of Adaboost algorithm, the cost-sensitive function is introduced into the Adaboost algorithm; finally the skin color segmentation and cost-sensitive Adaboost algorithm are combined for the face detection. Experimental results show that the proposed detection method has a higher detection rate and detection speed, which can more adapt to the actual field environment.
基金supported by the National Natural Science Foundation of China under Grant No. 60873077/F020107
文摘In many machine learning applications,data are not free,and there is a test cost for each data item. For the economical reason,some existing works try to minimize the test cost and at the same time,preserve a particular property of a given decision system. In this paper,we point out that the test cost one can afford is limited in some applications. Hence,one has to sacrifice respective properties to keep the test cost under a budget. To formalize this issue,we define the test cost constraint attribute reduction problem,where the optimization objective is to minimize the conditional information entropy. This problem is an essential generalization of both the test-cost-sensitive attribute reduction problem and the 0-1 knapsack problem,therefore it is more challenging. We propose a heuristic algorithm based on the information gain and test costs to deal with the new problem. The algorithm is tested on four UCI(University of California-Irvine) datasets with various test cost settings. Experimental results indicate the appropriate setting of the only user-specified parameter λ.
基金supported in part by State Grid Corporation of China Project“Research on high penetrated renewable energy oriented intelligent identification for curtailment impacts and aid decision-making for promoting consumption in regional power grids”(No.5108-202135035A-0-0-00).
文摘To secure power system operations,practical dispatches in industries place a steady power transfer limit on critical inter-corridors,rather than high-dimensional and strong nonlinear stability constraints.However,computational complexities lead to over-conservative pre-settings of transfer limit,which further induce undesirable and non-technical congestion of power transfer.To conquer this barrier,a scenario-classification hybrid-based banding method is proposed.A cluster technique is adopted to separate similarities from historical and generated operating condition dataset.With a practical rule,transfer limits are approximated for each operating cluster.Then,toward an interpretable online transfer limit decision,costsensitive learning is applied to identify cluster affiliation to assign a transfer limit for a given operation.In this stage,critical variables that affect the transfer limit are also picked out via mean impact value.This enables us to construct low-complexity and dispatcher-friendly rules for fast determination of transfer limit.The numerical case studies on the IEEE 39-bus system and a real-world regional power system in China illustrate the effectiveness and conservativeness of the proposed method.