The performance of traditional imbalanced classification algorithms is degraded when dealing with highly imbalanced data.How to deal with highly imbalanced data is a difficult problem.In this paper,the authors propose...The performance of traditional imbalanced classification algorithms is degraded when dealing with highly imbalanced data.How to deal with highly imbalanced data is a difficult problem.In this paper,the authors propose an ensemble tree classifier for highly imbalanced data classification.The ensemble tree classifier is constructed with a complete binary tree structure.A mathematical model is established based on the features and classification performance of the classifier,and it is proven that the model parameters of the ensemble classifier can be solved by calculation.First,the AdaBoost method is used as the benchmark classifier to construct the tree structure model.Then,the classification cost of the model is calculated,and the quantitative mathematical description between the cost and features of the ensemble tree classifier model is obtained.Then,the cost of the classification model is transformed into an optimization problem,and the parameters of the integrated tree classifier are given through theoretical derivation.This approach is tested on several highly imbalanced datasets in different fields and takes the AUC(area under the curve)and F-measure as evaluation criteria.Compared with the traditional imbalanced classification algorithm,the ensemble tree classifier has better classification performance.展开更多
基金supported by the National Natural Science Foundation of China under Grant No.61976198the Natural Science Research Key Project for Colleges and Universities of Anhui Province under Grant No.KJ2019A0726the High-level Scientific Research Foundation for the Introduction of Talent of Hefei Normal University under Grant No.2020RCJJ44。
文摘The performance of traditional imbalanced classification algorithms is degraded when dealing with highly imbalanced data.How to deal with highly imbalanced data is a difficult problem.In this paper,the authors propose an ensemble tree classifier for highly imbalanced data classification.The ensemble tree classifier is constructed with a complete binary tree structure.A mathematical model is established based on the features and classification performance of the classifier,and it is proven that the model parameters of the ensemble classifier can be solved by calculation.First,the AdaBoost method is used as the benchmark classifier to construct the tree structure model.Then,the classification cost of the model is calculated,and the quantitative mathematical description between the cost and features of the ensemble tree classifier model is obtained.Then,the cost of the classification model is transformed into an optimization problem,and the parameters of the integrated tree classifier are given through theoretical derivation.This approach is tested on several highly imbalanced datasets in different fields and takes the AUC(area under the curve)and F-measure as evaluation criteria.Compared with the traditional imbalanced classification algorithm,the ensemble tree classifier has better classification performance.