Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development.Machine learning(ML)methods are used for petroleum-related studies,but have not been appl...Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development.Machine learning(ML)methods are used for petroleum-related studies,but have not been applied to reservoir identification and production prediction based on reservoir identification.Production forecasting studies are typically based on overall reservoir thickness and lack accuracy when reservoirs contain a water or dry layer without oil production.In this paper,a systematic ML method was developed using classification models for reservoir identification,and regression models for production prediction.The production models are based on the reservoir identification results.To realize the reservoir identification,seven optimized ML methods were used:four typical single ML methods and three ensemble ML methods.These methods classify the reservoir into five types of layers:water,dry and three levels of oil(I oil layer,II oil layer,III oil layer).The validation and test results of these seven optimized ML methods suggest the three ensemble methods perform better than the four single ML methods in reservoir identification.The XGBoost produced the model with the highest accuracy;up to 99%.The effective thickness of I and II oil layers determined during the reservoir identification was fed into the models for predicting production.Effective thickness considers the distribution of the water and the oil resulting in a more reasonable production prediction compared to predictions based on the overall reservoir thickness.To validate the superiority of the ML methods,reference models using overall reservoir thickness were built for comparison.The models based on effective thickness outperformed the reference models in every evaluation metric.The prediction accuracy of the ML models using effective thickness were 10%higher than that of reference model.Without the personal error or data distortion existing in traditional methods,this novel system realizes rapid analysis of data while reducing the time required to resolve reservoir classification and production prediction challenges.The ML models using the effective thickness obtained from reservoir identification were more accurate when predicting oil production compared to previous studies which use overall reservoir thickness.展开更多
This paper proposes an adaptive and diverse hybrid-based ensemble method to improve the performance of binary classification. The proposed method is a non-linear combination of base models and the application of adapt...This paper proposes an adaptive and diverse hybrid-based ensemble method to improve the performance of binary classification. The proposed method is a non-linear combination of base models and the application of adaptive selection of the most suitable model for each data instance. Ensemble method, an important machine learning technique uses multiple single models to construct a hybrid model. A hybrid model generally performs better compared to a single individual model. In a given dataset the application of diverse single models trained with different machine learning algorithms will have different capabilities in recognizing patterns in the given training sample. The proposed approach has been validated on Repeat Buyers Prediction dataset and Census Income Prediction dataset. The experiment results indicate up to 18.5% improvement on F1 score for the Repeat Buyers dataset compared to the best individual model. This improvement also indicates that the proposed ensemble method has an exceptional ability of dealing with imbalanced datasets. In addition, the proposed method outperforms two other commonly used ensemble methods (Averaging and Stacking) in terms of improved F1 score. Finally, our results produced a slightly higher AUC score of 0.718 compared to the previous result of AUC score of 0.712 in the Repeat Buyers competition. This roughly 1% increase AUC score in performance is significant considering a very big dataset such as Repeat Buyers.展开更多
Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support governm...Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.展开更多
Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-...Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-precision prediction ensemble model of strip flatness at the outlet was established.Firstly,based on linear regression(LR),K nearest neighbors(KNN),support vector regression,regression trees(RT),and backpropagation neural network(BPN),bagging,boosting,and stacking ensemble methods were used for ensemble experiments.Secondly,three existing ensemble models,i.e.,random forest,extreme random tree(ET)and extreme gradient boosting,were used to conduct experiments and compare the results.The research shows that bagging,boosting,and stacking three ensemble methods have the most significant improvement in the prediction accuracy of the regression trees model,which is increased by 5.28%,6.51%,and 5.32%,respectively.At the same time,the stacking ensemble method improves both the simple model and the complex model,and the improvement effect on the simple base model is the greatest,which is 4.69%higher than that of the base model KNN.Comparing all of the ensemble models,the stacking ensemble model of level-1(ET,AdaBoost-RT,LR,BPN)paired with level-2(LR)was discovered to be the best model(EALB-LR)and can be further studied for industrial applications.展开更多
Urban living in large modern cities exerts considerable adverse effectson health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urb...Urban living in large modern cities exerts considerable adverse effectson health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urbanizedcountries. The primary objective of this work is to introduce and develop predictive analytics for predicting CKDs. However, prediction of huge samples isbecoming increasingly difficult. Meanwhile, MapReduce provides a feasible framework for programming predictive algorithms with map and reduce functions.The relatively simple programming interface helps solve problems in the scalability and efficiency of predictive learning algorithms. In the proposed work, theiterative weighted map reduce framework is introduced for the effective management of large dataset samples. A binary classification problem is formulated usingensemble nonlinear support vector machines and random forests. Thus, instead ofusing the normal linear combination of kernel activations, the proposed work creates nonlinear combinations of kernel activations in prototype examples. Furthermore, different descriptors are combined in an ensemble of deep support vectormachines, where the product rule is used to combine probability estimates ofdifferent classifiers. Performance is evaluated in terms of the prediction accuracyand interpretability of the model and the results.展开更多
Machine learning methods are effective tools for improving short-term climate prediction.However,commonly used methods often carry out classification and regression prediction modeling separately and independently.Suc...Machine learning methods are effective tools for improving short-term climate prediction.However,commonly used methods often carry out classification and regression prediction modeling separately and independently.Such a single modeling approach may obtain inconsistent prediction results in classification and regression and thus may not meet the needs of practical applications well.To address this issue,this study proposes a selective Naive Bayes ensemble model(SENB-EM)by introducing causal effect and voting strategy on Naive Bayes.The new model can not only screen effective predictors but also perform classification and regression prediction simultaneously.After being applied to the area prediction of summer western North Pacific subtropical high(WNPSH)from 2008 to 2021,it is found that the accuracy classification score(a metric to assess the overall classification prediction accuracy)and the time correlation coefficient(TCC)of SENB-EM can reach 1.0 and 0.81,respectively.After integrating the results of different models[including multiple linear regression ensemble model(MLR-EM),SENB-EM,and Chinese Multimodel Ensemble Prediction System(CMME)used by National Climate Center(NCC)]for 2017-2021,the TCC of the ensemble results of SENB-EM and CMME can reach 0.92(the highest result among them).This indicates that the prediction results of the summer WNPSH area provided by SENB-EM have a high reference value for the real-time prediction.It is worth noting that,except for the numerical prediction results,the SENB-EM model can also give the range of numerical prediction intervals and predictions for anomalous degrees of the WNPSH area,thus providing more reference information for meteorological forecasters.Overall,as a new hybrid machine learning model,the SENB-EM has a good prediction ability;the approach of performing classification prediction and regression prediction simultaneously through integration is informative to short-term climate prediction.展开更多
Due to the complexity of economic system and the interactive effects between all kinds of economic variables and foreign trade, it is not easy to predict foreign trade volume. However, the difficulty in predicting for...Due to the complexity of economic system and the interactive effects between all kinds of economic variables and foreign trade, it is not easy to predict foreign trade volume. However, the difficulty in predicting foreign trade volume is usually attributed to the limitation of many conventional forecasting models. To improve the prediction performance, the study proposes a novel kernel-based ensemble learning approach hybridizing econometric models and artificial intelligence (AI) models to predict China's foreign trade volume. In the proposed approach, an important econometric model, the co-integration-based error correction vector auto-regression (EC-VAR) model is first used to capture the impacts of all kinds of economic variables on Chinese foreign trade from a multivariate linear anal- ysis perspective. Then an artificial neural network (ANN) based EC-VAR model is used to capture the nonlinear effects of economic variables on foreign trade from the nonlinear viewpoint. Subsequently, for incorporating the effects of irregular events on foreign trade, the text mining and expert's judgmental adjustments are also integrated into the nonlinear ANN-based EC-VAR model. Finally, all kinds of economic variables, the outputs of linear and nonlinear EC-VAR models and judgmental adjustment model are used as input variables of a typical kernel-based support vector regression (SVR) for en- semble prediction purpose. For illustration, the proposed kernel-based ensemble learning methodology hybridizing econometric techniques and AI methods is applied to China's foreign trade volume predic- tion problem. Experimental results reveal that the hybrid econometric-AI ensemble learning approach can significantly improve the prediction performance over other linear and nonlinear models listed in this study.展开更多
Data-mining is a kind of solution for solving the problem of information exploding. Classification and prediction belong to the most fundamental tasks in data-mining field. Many experiments have showed that the result...Data-mining is a kind of solution for solving the problem of information exploding. Classification and prediction belong to the most fundamental tasks in data-mining field. Many experiments have showed that the results of ensemble of learning methods are generally better than those of single learning methods under most of the time. In the sense,it is of great value to introduce ensemble of learning methods to data mining. This paper introduces data mining and ensemble of learning methods respectively,along with the analysis and formulation about the role ensemble of learning methods can act in some important practicing aspects of data mining:Text mining,multi-media information mining and web mining.展开更多
文摘Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development.Machine learning(ML)methods are used for petroleum-related studies,but have not been applied to reservoir identification and production prediction based on reservoir identification.Production forecasting studies are typically based on overall reservoir thickness and lack accuracy when reservoirs contain a water or dry layer without oil production.In this paper,a systematic ML method was developed using classification models for reservoir identification,and regression models for production prediction.The production models are based on the reservoir identification results.To realize the reservoir identification,seven optimized ML methods were used:four typical single ML methods and three ensemble ML methods.These methods classify the reservoir into five types of layers:water,dry and three levels of oil(I oil layer,II oil layer,III oil layer).The validation and test results of these seven optimized ML methods suggest the three ensemble methods perform better than the four single ML methods in reservoir identification.The XGBoost produced the model with the highest accuracy;up to 99%.The effective thickness of I and II oil layers determined during the reservoir identification was fed into the models for predicting production.Effective thickness considers the distribution of the water and the oil resulting in a more reasonable production prediction compared to predictions based on the overall reservoir thickness.To validate the superiority of the ML methods,reference models using overall reservoir thickness were built for comparison.The models based on effective thickness outperformed the reference models in every evaluation metric.The prediction accuracy of the ML models using effective thickness were 10%higher than that of reference model.Without the personal error or data distortion existing in traditional methods,this novel system realizes rapid analysis of data while reducing the time required to resolve reservoir classification and production prediction challenges.The ML models using the effective thickness obtained from reservoir identification were more accurate when predicting oil production compared to previous studies which use overall reservoir thickness.
文摘This paper proposes an adaptive and diverse hybrid-based ensemble method to improve the performance of binary classification. The proposed method is a non-linear combination of base models and the application of adaptive selection of the most suitable model for each data instance. Ensemble method, an important machine learning technique uses multiple single models to construct a hybrid model. A hybrid model generally performs better compared to a single individual model. In a given dataset the application of diverse single models trained with different machine learning algorithms will have different capabilities in recognizing patterns in the given training sample. The proposed approach has been validated on Repeat Buyers Prediction dataset and Census Income Prediction dataset. The experiment results indicate up to 18.5% improvement on F1 score for the Repeat Buyers dataset compared to the best individual model. This improvement also indicates that the proposed ensemble method has an exceptional ability of dealing with imbalanced datasets. In addition, the proposed method outperforms two other commonly used ensemble methods (Averaging and Stacking) in terms of improved F1 score. Finally, our results produced a slightly higher AUC score of 0.718 compared to the previous result of AUC score of 0.712 in the Repeat Buyers competition. This roughly 1% increase AUC score in performance is significant considering a very big dataset such as Repeat Buyers.
基金supported by the Taishan Scholars (No.ts201712003)。
文摘Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.
基金This study was supported by the National Key Research and Development Program of China(No.2017YFB0304100)Key Projects of the National Natural Science Foundation of China(No.51634002).
文摘Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-precision prediction ensemble model of strip flatness at the outlet was established.Firstly,based on linear regression(LR),K nearest neighbors(KNN),support vector regression,regression trees(RT),and backpropagation neural network(BPN),bagging,boosting,and stacking ensemble methods were used for ensemble experiments.Secondly,three existing ensemble models,i.e.,random forest,extreme random tree(ET)and extreme gradient boosting,were used to conduct experiments and compare the results.The research shows that bagging,boosting,and stacking three ensemble methods have the most significant improvement in the prediction accuracy of the regression trees model,which is increased by 5.28%,6.51%,and 5.32%,respectively.At the same time,the stacking ensemble method improves both the simple model and the complex model,and the improvement effect on the simple base model is the greatest,which is 4.69%higher than that of the base model KNN.Comparing all of the ensemble models,the stacking ensemble model of level-1(ET,AdaBoost-RT,LR,BPN)paired with level-2(LR)was discovered to be the best model(EALB-LR)and can be further studied for industrial applications.
文摘Urban living in large modern cities exerts considerable adverse effectson health and thus increases the risk of contracting several chronic kidney diseases (CKD). The prediction of CKDs has become a major task in urbanizedcountries. The primary objective of this work is to introduce and develop predictive analytics for predicting CKDs. However, prediction of huge samples isbecoming increasingly difficult. Meanwhile, MapReduce provides a feasible framework for programming predictive algorithms with map and reduce functions.The relatively simple programming interface helps solve problems in the scalability and efficiency of predictive learning algorithms. In the proposed work, theiterative weighted map reduce framework is introduced for the effective management of large dataset samples. A binary classification problem is formulated usingensemble nonlinear support vector machines and random forests. Thus, instead ofusing the normal linear combination of kernel activations, the proposed work creates nonlinear combinations of kernel activations in prototype examples. Furthermore, different descriptors are combined in an ensemble of deep support vectormachines, where the product rule is used to combine probability estimates ofdifferent classifiers. Performance is evaluated in terms of the prediction accuracyand interpretability of the model and the results.
基金Supported by the National Natural Science Foundation of China (42130610,41975076,and 42175067)National Key Research and Development Program of China (2019YFA0607104)。
文摘Machine learning methods are effective tools for improving short-term climate prediction.However,commonly used methods often carry out classification and regression prediction modeling separately and independently.Such a single modeling approach may obtain inconsistent prediction results in classification and regression and thus may not meet the needs of practical applications well.To address this issue,this study proposes a selective Naive Bayes ensemble model(SENB-EM)by introducing causal effect and voting strategy on Naive Bayes.The new model can not only screen effective predictors but also perform classification and regression prediction simultaneously.After being applied to the area prediction of summer western North Pacific subtropical high(WNPSH)from 2008 to 2021,it is found that the accuracy classification score(a metric to assess the overall classification prediction accuracy)and the time correlation coefficient(TCC)of SENB-EM can reach 1.0 and 0.81,respectively.After integrating the results of different models[including multiple linear regression ensemble model(MLR-EM),SENB-EM,and Chinese Multimodel Ensemble Prediction System(CMME)used by National Climate Center(NCC)]for 2017-2021,the TCC of the ensemble results of SENB-EM and CMME can reach 0.92(the highest result among them).This indicates that the prediction results of the summer WNPSH area provided by SENB-EM have a high reference value for the real-time prediction.It is worth noting that,except for the numerical prediction results,the SENB-EM model can also give the range of numerical prediction intervals and predictions for anomalous degrees of the WNPSH area,thus providing more reference information for meteorological forecasters.Overall,as a new hybrid machine learning model,the SENB-EM has a good prediction ability;the approach of performing classification prediction and regression prediction simultaneously through integration is informative to short-term climate prediction.
基金the National Natural Science Foundation of China under Grant Nos.70601029 and 70221001the Knowledge Innovation Program of the Chinese Academy of Sciences under Grant Nos.3547600,3046540,and 3047540the Strategy Research Grant of City University of Hong Kong under Grant No.7001806
文摘Due to the complexity of economic system and the interactive effects between all kinds of economic variables and foreign trade, it is not easy to predict foreign trade volume. However, the difficulty in predicting foreign trade volume is usually attributed to the limitation of many conventional forecasting models. To improve the prediction performance, the study proposes a novel kernel-based ensemble learning approach hybridizing econometric models and artificial intelligence (AI) models to predict China's foreign trade volume. In the proposed approach, an important econometric model, the co-integration-based error correction vector auto-regression (EC-VAR) model is first used to capture the impacts of all kinds of economic variables on Chinese foreign trade from a multivariate linear anal- ysis perspective. Then an artificial neural network (ANN) based EC-VAR model is used to capture the nonlinear effects of economic variables on foreign trade from the nonlinear viewpoint. Subsequently, for incorporating the effects of irregular events on foreign trade, the text mining and expert's judgmental adjustments are also integrated into the nonlinear ANN-based EC-VAR model. Finally, all kinds of economic variables, the outputs of linear and nonlinear EC-VAR models and judgmental adjustment model are used as input variables of a typical kernel-based support vector regression (SVR) for en- semble prediction purpose. For illustration, the proposed kernel-based ensemble learning methodology hybridizing econometric techniques and AI methods is applied to China's foreign trade volume predic- tion problem. Experimental results reveal that the hybrid econometric-AI ensemble learning approach can significantly improve the prediction performance over other linear and nonlinear models listed in this study.
文摘Data-mining is a kind of solution for solving the problem of information exploding. Classification and prediction belong to the most fundamental tasks in data-mining field. Many experiments have showed that the results of ensemble of learning methods are generally better than those of single learning methods under most of the time. In the sense,it is of great value to introduce ensemble of learning methods to data mining. This paper introduces data mining and ensemble of learning methods respectively,along with the analysis and formulation about the role ensemble of learning methods can act in some important practicing aspects of data mining:Text mining,multi-media information mining and web mining.