Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as indust...Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as industrial fault diagnosis,network intrusion detection,cancer detection,etc.In imbalanced classification tasks,the focus is typically on achieving high recognition accuracy for the minority class.However,due to the challenges presented by imbalanced multi-class datasets,such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries,existing methods often do not perform well in multi-class imbalanced data classification tasks,particularly in terms of recognizing minority classes with high accuracy.Therefore,this paper proposes a multi-class imbalanced data classification method called CSDSResNet,which is based on a cost-sensitive dualstream residual network.Firstly,to address the issue of limited samples in the minority class within imbalanced datasets,a dual-stream residual network backbone structure is designed to enhance the model’s feature extraction capability.Next,considering the complexities arising fromimbalanced inter-class sample quantities and imbalanced inter-class overlapping boundaries in multi-class imbalanced datasets,a unique cost-sensitive loss function is devised.This loss function places more emphasis on the minority class and the challenging classes with high interclass similarity,thereby improving the model’s classification ability.Finally,the effectiveness and generalization of the proposed method,CSDSResNet,are evaluated on two datasets:‘DryBeans’and‘Electric Motor Defects’.The experimental results demonstrate that CSDSResNet achieves the best performance on imbalanced datasets,with macro_F1-score values improving by 2.9%and 1.9%on the two datasets compared to current state-of-the-art classification methods,respectively.Furthermore,it achieves the highest precision in single-class recognition tasks for the minority class.展开更多
To deal with the data mining problem of asymmetry misclassification cost, an innovative churn prediction method is proposed based on existing churn prediction research. This method adjusts the misclassification cost b...To deal with the data mining problem of asymmetry misclassification cost, an innovative churn prediction method is proposed based on existing churn prediction research. This method adjusts the misclassification cost based on the C4. 5 decision tree as a baseline classifier, which can obtain the prediction model with a minimum error rate based on the assumption that all misclassifications have the same cost, to realize cost-sensitive learning. Results from customer data of a certain Chinese telecommunication company and the fact that the churners and the non-churners have different misclassification costs demonstrate that by altering the sampling ratio of churners and non-churners, this cost-sensitive learning method can considerably reduce the total misclassification cost produced by traditional classification methods. This method can also play an important role in promoting core competence of Chinese telecommunication industry.展开更多
Despite exploration and production success in Niger Delta,several failed wells have been encountered due to overpressures.Hence,it is very essential to understand the spatial distribution of pore pressure and the gene...Despite exploration and production success in Niger Delta,several failed wells have been encountered due to overpressures.Hence,it is very essential to understand the spatial distribution of pore pressure and the generating mechanisms in order to mitigate the pitfalls that might arise during drilling.This research provides estimates of pore pressure along three offshore wells using the Eaton's transit time method,multi-layer perceptron artificial neural network(MLP-ANN)and random forest regression(RFR)algorithms.Our results show that there are three pressure magnitude regimes:normal pressure zone(hydrostatic pressure),transition pressure zone(slightly above hydrostatic pressure),and over pressured zone(significantly above hydrostatic pressure).The top of the geopressured zone(2873 mbRT or 9425.853 ft)averagely marks the onset of overpressurization with the excess pore pressure above hydrostatic pressure(P∗)varying averagely along the three wells between 1.06−24.75 MPa.The results from the three methods are self-consistent with strong correlation between the Eaton's method and the two machine learning models.The models have high accuracy of about>97%,low mean absolute percentage error(MAPE<3%)and coefficient of determination(R2>0.98).Our results have also shown that the principal generating mechanisms responsible for high pore pressure in the offshore Niger Delta are disequilibrium compaction,unloading(fluid expansion)and shale diagenesis.展开更多
基金supported by Beijing Municipal Science and Technology Project(No.Z221100007122003)。
文摘Imbalanced data classification is the task of classifying datasets where there is a significant disparity in the number of samples between different classes.This task is prevalent in practical scenarios such as industrial fault diagnosis,network intrusion detection,cancer detection,etc.In imbalanced classification tasks,the focus is typically on achieving high recognition accuracy for the minority class.However,due to the challenges presented by imbalanced multi-class datasets,such as the scarcity of samples in minority classes and complex inter-class relationships with overlapping boundaries,existing methods often do not perform well in multi-class imbalanced data classification tasks,particularly in terms of recognizing minority classes with high accuracy.Therefore,this paper proposes a multi-class imbalanced data classification method called CSDSResNet,which is based on a cost-sensitive dualstream residual network.Firstly,to address the issue of limited samples in the minority class within imbalanced datasets,a dual-stream residual network backbone structure is designed to enhance the model’s feature extraction capability.Next,considering the complexities arising fromimbalanced inter-class sample quantities and imbalanced inter-class overlapping boundaries in multi-class imbalanced datasets,a unique cost-sensitive loss function is devised.This loss function places more emphasis on the minority class and the challenging classes with high interclass similarity,thereby improving the model’s classification ability.Finally,the effectiveness and generalization of the proposed method,CSDSResNet,are evaluated on two datasets:‘DryBeans’and‘Electric Motor Defects’.The experimental results demonstrate that CSDSResNet achieves the best performance on imbalanced datasets,with macro_F1-score values improving by 2.9%and 1.9%on the two datasets compared to current state-of-the-art classification methods,respectively.Furthermore,it achieves the highest precision in single-class recognition tasks for the minority class.
文摘To deal with the data mining problem of asymmetry misclassification cost, an innovative churn prediction method is proposed based on existing churn prediction research. This method adjusts the misclassification cost based on the C4. 5 decision tree as a baseline classifier, which can obtain the prediction model with a minimum error rate based on the assumption that all misclassifications have the same cost, to realize cost-sensitive learning. Results from customer data of a certain Chinese telecommunication company and the fact that the churners and the non-churners have different misclassification costs demonstrate that by altering the sampling ratio of churners and non-churners, this cost-sensitive learning method can considerably reduce the total misclassification cost produced by traditional classification methods. This method can also play an important role in promoting core competence of Chinese telecommunication industry.
文摘Despite exploration and production success in Niger Delta,several failed wells have been encountered due to overpressures.Hence,it is very essential to understand the spatial distribution of pore pressure and the generating mechanisms in order to mitigate the pitfalls that might arise during drilling.This research provides estimates of pore pressure along three offshore wells using the Eaton's transit time method,multi-layer perceptron artificial neural network(MLP-ANN)and random forest regression(RFR)algorithms.Our results show that there are three pressure magnitude regimes:normal pressure zone(hydrostatic pressure),transition pressure zone(slightly above hydrostatic pressure),and over pressured zone(significantly above hydrostatic pressure).The top of the geopressured zone(2873 mbRT or 9425.853 ft)averagely marks the onset of overpressurization with the excess pore pressure above hydrostatic pressure(P∗)varying averagely along the three wells between 1.06−24.75 MPa.The results from the three methods are self-consistent with strong correlation between the Eaton's method and the two machine learning models.The models have high accuracy of about>97%,low mean absolute percentage error(MAPE<3%)and coefficient of determination(R2>0.98).Our results have also shown that the principal generating mechanisms responsible for high pore pressure in the offshore Niger Delta are disequilibrium compaction,unloading(fluid expansion)and shale diagenesis.