Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. ...Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. The above statement holds for West Texas, Midland, and Odessa Precisely. Two machine learning regression algorithms (Random Forest and XGBoost) were employed to develop models for the prediction of total dissolved solids (TDS) and sodium absorption ratio (SAR) for efficient water quality monitoring of two vital aquifers: Edward-Trinity (plateau), and Ogallala aquifers. These two aquifers have contributed immensely to providing water for different uses ranging from domestic, agricultural, industrial, etc. The data was obtained from the Texas Water Development Board (TWDB). The XGBoost and Random Forest models used in this study gave an accurate prediction of observed data (TDS and SAR) for both the Edward-Trinity (plateau) and Ogallala aquifers with the R<sup>2</sup> values consistently greater than 0.83. The Random Forest model gave a better prediction of TDS and SAR concentration with an average R, MAE, RMSE and MSE of 0.977, 0.015, 0.029 and 0.00, respectively. For the XGBoost, an average R, MAE, RMSE, and MSE of 0.953, 0.016, 0.037 and 0.00, respectively, were achieved. The overall performance of the models produced was impressive. From this study, we can clearly understand that Random Forest and XGBoost are appropriate for water quality prediction and monitoring in an area of high hydrocarbon activities like Midland and Odessa and West Texas at large.展开更多
The Sentinel-2 satellites are providing an unparalleled wealth of high-resolution remotely sensed information with a short revisit cycle, which is ideal for mapping burned areas both accurately and timely. This paper ...The Sentinel-2 satellites are providing an unparalleled wealth of high-resolution remotely sensed information with a short revisit cycle, which is ideal for mapping burned areas both accurately and timely. This paper proposes an automated methodology for mapping burn scars using pairs of Sentinel-2 imagery, exploiting the state-of-the-art eXtreme Gradient Boosting (XGB) machine learning framework. A large database of 64 reference wildfire perimeters in Greece from 2016 to 2019 is used to train the classifier. An empirical methodology for appropriately sampling the training patterns from this database is formulated, which guarantees the effectiveness of the approach and its computational efficiency. A difference (pre-fire minus post-fire) spectral index is used for this purpose, upon which we appropriately identify the clear and fuzzy value ranges. To reduce the data volume, a super-pixel segmentation of the images is also employed, implemented via the QuickShift algorithm. The cross-validation results showcase the effectiveness of the proposed algorithm, with the average commission and omission errors being 9% and 2%, respectively, and the average Matthews correlation coefficient (MCC) equal to 0.93.展开更多
Accurate assessment of undrained shear strength(USS)for soft sensitive clays is a great concern in geotechnical engineering practice.This study applies novel data-driven extreme gradient boosting(XGBoost)and random fo...Accurate assessment of undrained shear strength(USS)for soft sensitive clays is a great concern in geotechnical engineering practice.This study applies novel data-driven extreme gradient boosting(XGBoost)and random forest(RF)ensemble learning methods for capturing the relationships between the USS and various basic soil parameters.Based on the soil data sets from TC304 database,a general approach is developed to predict the USS of soft clays using the two machine learning methods above,where five feature variables including the preconsolidation stress(PS),vertical effective stress(VES),liquid limit(LL),plastic limit(PL)and natural water content(W)are adopted.To reduce the dependence on the rule of thumb and inefficient brute-force search,the Bayesian optimization method is applied to determine the appropriate model hyper-parameters of both XGBoost and RF.The developed models are comprehensively compared with three comparison machine learning methods and two transformation models with respect to predictive accuracy and robustness under 5-fold cross-validation(CV).It is shown that XGBoost-based and RF-based methods outperform these approaches.Besides,the XGBoostbased model provides feature importance ranks,which makes it a promising tool in the prediction of geotechnical parameters and enhances the interpretability of model.展开更多
It is important for regional water resources management to know the agricultural water consumption information several months in advance.Forecasting reference evapotranspiration(ET_(0))in the next few months is import...It is important for regional water resources management to know the agricultural water consumption information several months in advance.Forecasting reference evapotranspiration(ET_(0))in the next few months is important for irrigation and reservoir management.Studies on forecasting of multiple-month ahead ET_(0) using machine learning models have not been reported yet.Besides,machine learning models such as the XGBoost model has multiple parameters that need to be tuned,and traditional methods can get stuck in a regional optimal solution and fail to obtain a global optimal solution.This study investigated the performance of the hybrid extreme gradient boosting(XGBoost)model coupled with the Grey Wolf Optimizer(GWO)algorithm for forecasting multi-step ahead ET_(0)(1-3 months ahead),compared with three conventional machine learning models,i.e.,standalone XGBoost,multi-layer perceptron(MLP)and M5 model tree(M5)models in the subtropical zone of China.The results showed that theGWO-XGB model generally performed better than the other three machine learning models in forecasting 1-3 months ahead ET_(0),followed by the XGB,M5 and MLP models with very small differences among the three models.The GWO-XGB model performed best in autumn,while the MLP model performed slightly better than the other three models in summer.It is thus suggested to apply the MLP model for ET_(0) forecasting in summer but use the GWO-XGB model in other seasons.展开更多
Complex modulus(G^(*))is one of the important criteria for asphalt classification according to AASHTO M320-10,and is often used to predict the linear viscoelastic behavior of asphalt binders.In addition,phase angle(φ...Complex modulus(G^(*))is one of the important criteria for asphalt classification according to AASHTO M320-10,and is often used to predict the linear viscoelastic behavior of asphalt binders.In addition,phase angle(φ)characterizes the deformation resilience of asphalt and is used to assess the ratio between the viscous and elastic components.It is thus important to quickly and accurately estimate these two indicators.The purpose of this investigation is to construct an extreme gradient boosting(XGB)model to predict G^(*)andφof graphene oxide(GO)modified asphaltat medium and high temperatures.Two data sets are gathered from previously published experiments,consisting of 357 samples for G^(*)and 339 samples forφ,and the se are used to develop the XGB model using nine inputs representing theasphalt binder components.The findings show that XGB is an excellent predictor of G^(*)andφof GO-modified asphalt,evaluated by the coefficient of determination R^(2)(R^(2)=0.990 and 0.9903 for G^(*)andφ,respectively)and root mean square error(RMSE=31.499 and 1.08 for G^(*)andφ,respectively).In addition,the model’s performance is compared with experimental results and five other machine learning(ML)models to highlight its accuracy.In the final step,the Shapley additive explanations(SHAP)value analysis is conducted to assess the impact of each input and the correlation between pairs of important features on asphalt’s two physical properties.展开更多
To enhance the accuracy and efficiency of bridge damage identification,a novel data-driven damage identification method was proposed.First,convolutional autoencoder(CAE)was used to extract key features from the accele...To enhance the accuracy and efficiency of bridge damage identification,a novel data-driven damage identification method was proposed.First,convolutional autoencoder(CAE)was used to extract key features from the acceleration signal of the bridge structure through data reconstruction.The extreme gradient boosting tree(XGBoost)was then used to perform analysis on the feature data to achieve damage detection with high accuracy and high performance.The proposed method was applied in a numerical simulation study on a three-span continuous girder and further validated experimentally on a scaled model of a cable-stayed bridge.The numerical simulation results show that the identification errors remain within 2.9%for six single-damage cases and within 3.1%for four double-damage cases.The experimental validation results demonstrate that when the tension in a single cable of the cable-stayed bridge decreases by 20%,the method accurately identifies damage at different cable locations using only sensors installed on the main girder,achieving identification accuracies above 95.8%in all cases.The proposed method shows high identification accuracy and generalization ability across various damage scenarios.展开更多
Concrete is the most commonly used construction material.However,its production leads to high carbon dioxide(CO_(2))emissions and energy consumption.Therefore,developing waste-substitutable concrete components is nece...Concrete is the most commonly used construction material.However,its production leads to high carbon dioxide(CO_(2))emissions and energy consumption.Therefore,developing waste-substitutable concrete components is necessary.Improving the sustainability and greenness of concrete is the focus of this research.In this regard,899 data points were collected from existing studies where cement,slag,fly ash,superplasticizer,coarse aggregate,and fine aggregate were considered potential influential factors.The complex relationship between influential factors and concrete compressive strength makes the prediction and estimation of compressive strength difficult.Instead of the traditional compressive strength test,this study combines five novel metaheuristic algorithms with extreme gradient boosting(XGB)to predict the compressive strength of green concrete based on fly ash and blast furnace slag.The intelligent prediction models were assessed using the root mean square error(RMSE),coefficient of determination(R^(2)),mean absolute error(MAE),and variance accounted for(VAF).The results indicated that the squirrel search algorithm-extreme gradient boosting(SSA-XGB)yielded the best overall prediction performance with R^(2) values of 0.9930 and 0.9576,VAF values of 99.30 and 95.79,MAE values of 0.52 and 2.50,RMSE of 1.34 and 3.31 for the training and testing sets,respectively.The remaining five prediction methods yield promising results.Therefore,the developed hybrid XGB model can be introduced as an accurate and fast technique for the performance prediction of green concrete.Finally,the developed SSA-XGB considered the effects of all the input factors on the compressive strength.The ability of the model to predict the performance of concrete with unknown proportions can play a significant role in accelerating the development and application of sustainable concrete and furthering a sustainable economy.展开更多
Background:Accurate risk stratification of critically ill patients with coronavirus disease 2019(COVID-19)is essential for optimizing resource allocation,delivering targeted interventions,and maximizing patient surviv...Background:Accurate risk stratification of critically ill patients with coronavirus disease 2019(COVID-19)is essential for optimizing resource allocation,delivering targeted interventions,and maximizing patient survival probability.Machine learning(ML)techniques are attracting increased interest for the development of prediction models as they excel in the analysis of complex signals in data-rich environments such as critical care.Methods:We retrieved data on patients with COVID-19 admitted to an intensive care unit(ICU)between March and October 2020 from the RIsk Stratification in COVID-19 patients in the Intensive Care Unit(RISC-19-ICU)registry.We applied the Extreme Gradient Boosting(XGBoost)algorithm to the data to predict as a binary out-come the increase or decrease in patients’Sequential Organ Failure Assessment(SOFA)score on day 5 after ICU admission.The model was iteratively cross-validated in different subsets of the study cohort.Results:The final study population consisted of 675 patients.The XGBoost model correctly predicted a decrease in SOFA score in 320/385(83%)critically ill COVID-19 patients,and an increase in the score in 210/290(72%)patients.The area under the mean receiver operating characteristic curve for XGBoost was significantly higher than that for the logistic regression model(0.86 vs.0.69,P<0.01[paired t-test with 95%confidence interval]).Conclusions:The XGBoost model predicted the change in SOFA score in critically ill COVID-19 patients admitted to the ICU and can guide clinical decision support systems(CDSSs)aimed at optimizing available resources.展开更多
Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing...Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing models to predict molten steel temperature has been conducted.However,most researchers focus solely on improving the accuracy of the model,neglecting its explainability.The present study aims to develop a high-precision and explainable model with improved reliability and transparency.The eXtreme gradient boosting(XGBoost)and light gradient boosting machine(LGBM)were utilized,along with bayesian optimization and grey wolf optimiz-ation(GWO),to establish the prediction model.Different performance evaluation metrics and graphical representations were applied to compare the optimal XGBoost and LGBM models obtained through varying hyperparameter optimization methods with the other models.The findings indicated that the GWO-LGBM model outperformed other methods in predicting molten steel temperature,with a high pre-diction accuracy of 89.35%within the error range of±5°C.The model’s learning/decision process was revealed,and the influence degree of different variables on the molten steel temperature was clarified using the tree structure visualization and SHapley Additive exPlana-tions(SHAP)analysis.Consequently,the explainability of the optimal GWO-LGBM model was enhanced,providing reliable support for prediction results.展开更多
Synthetic aperture radar(SAR)and wave spectrometers,crucial in microwave remote sensing,play an essential role in monitoring sea surface wind and wave conditions.However,they face inherent limitations in observing sea...Synthetic aperture radar(SAR)and wave spectrometers,crucial in microwave remote sensing,play an essential role in monitoring sea surface wind and wave conditions.However,they face inherent limitations in observing sea surface phenomena.SAR systems,for instance,are hindered by an azimuth cut-off phenomenon in sea surface wind field observation.Wave spectrometers,while unaffected by the azimuth cutoff phenomenon,struggle with low azimuth resolution,impacting the capture of detailed wave and wind field data.This study utilizes SAR and surface wave investigation and monitoring(SWIM)data to initially extract key feature parameters,which are then prioritized using the extreme gradient boosting(XGBoost)algorithm.The research further addresses feature collinearity through a combined analysis of feature importance and correlation,leading to the development of an inversion model for wave and wind parameters based on XGBoost.A comparative analysis of this model with ERA5 reanalysis and buoy data for of significant wave height,mean wave period,wind direction,and wind speed reveals root mean square errors of 0.212 m,0.525 s,27.446°,and 1.092 m/s,compared to 0.314 m,0.888 s,27.698°,and 1.315 m/s from buoy data,respectively.These results demonstrate the model’s effective retrieval of wave and wind parameters.Finally,the model,incorporating altimeter and scatterometer data,is evaluated against SAR/SWIM single and dual payload inversion methods across different wind speeds.This comparison highlights the model’s superior inversion accuracy over other methods.展开更多
Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-drive...Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-driven or statistical model and its assessment results are subjective,difficult to quantify,and no pertinence.As a new research method for landslide susceptibility assessment,machine learning can greatly improve the landslide susceptibility model’s accuracy by constructing statistical models.Taking Western Henan for example,the study selected 16 landslide influencing factors such as topography,geological environment,hydrological conditions,and human activities,and 11 landslide factors with the most significant influence on the landslide were selected by the recursive feature elimination(RFE)method.Five machine learning methods[Support Vector Machines(SVM),Logistic Regression(LR),Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Linear Discriminant Analysis(LDA)]were used to construct the spatial distribution model of landslide susceptibility.The models were evaluated by the receiver operating characteristic curve and statistical index.After analysis and comparison,the XGBoost model(AUC 0.8759)performed the best and was suitable for dealing with regression problems.The model had a high adaptability to landslide data.According to the landslide susceptibility map of the five models,the overall distribution can be observed.The extremely high and high susceptibility areas are distributed in the Funiu Mountain range in the southwest,the Xiaoshan Mountain range in the west,and the Yellow River Basin in the north.These areas have large terrain fluctuations,complicated geological structural environments and frequent human engineering activities.The extremely high and highly prone areas were 12043.3 km^(2)and 3087.45 km^(2),accounting for 47.61%and 12.20%of the total area of the study area,respectively.Our study reflects the distribution of landslide susceptibility in western Henan Province,which provides a scientific basis for regional disaster warning,prediction,and resource protection.The study has important practical significance for subsequent landslide disaster management.展开更多
Lithium-sulfur(Li-S)batteries are notable for their high theoretical energy density,but the‘shuttle effect’and the limited conversion kinetics of Li-S species can downgrade their actual performance.An essential stra...Lithium-sulfur(Li-S)batteries are notable for their high theoretical energy density,but the‘shuttle effect’and the limited conversion kinetics of Li-S species can downgrade their actual performance.An essential strategy is to design anchoring materials(AMs)to appropriately adsorb Li-S species.Herein,we propose a new three-procedure protocol,named InfoAd(Informative Adsorption)to evaluate the anchoring of Li_(2)S on two-dimensional(2D)materials and disclose the underlying importance of material features by combining high-throughput calculation workflow and machine learning(ML).In this paradigm,we calculate the anchoring of Li_(2)S on 12552D A_(x)B_(y)(B in the VIA/VIIA group)materials and pick out 44(un)reported nontoxic 2D binary A_(x)B_(y)AMs,in which the importance of the geometric features on the anchoring effect is revealed by ML for the first time.We develop a new Infograph model for crystals to accurately predict whether a material has a moderate binding with Li_(2)S and extend it to all 2D materials.Our InfoAd protocol elucidates the underlying structure-property relationship of Li_(2)S adsorption on 2D materials and provides a general research framework of adsorption-related materials for catalysis and energy/substance storage.展开更多
The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timel...The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timely deployment of fire-suppression resources.In this study,the DFMC and environmental variables,including air temperature,relative humidity,wind speed,solar radiation,rainfall,atmospheric pressure,soil temperature,and soil humidity,were simultaneously measured in a grassland of Ergun City,Inner Mongolia Autonomous Region of China in 2021.We chose three regression models,i.e.,random forest(RF)model,extreme gradient boosting(XGB)model,and boosted regression tree(BRT)model,to model the seasonal DFMC according to the data collected.To ensure accuracy,we added time-lag variables of 3 d to the models.The results showed that the RF model had the best fitting effect with an R2value of 0.847 and a prediction accuracy with a mean absolute error score of 4.764%among the three models.The accuracies of the models in spring and autumn were higher than those in the other two seasons.In addition,different seasons had different key influencing factors,and the degree of influence of these factors on the DFMC changed with time lags.Moreover,time-lag variables within 44 h clearly improved the fitting effect and prediction accuracy,indicating that environmental conditions within approximately 48 h greatly influence the DFMC.This study highlights the importance of considering 48 h time-lagged variables when predicting the DFMC of grassland fuels and mapping grassland fire risks based on the DFMC to help locate high-priority areas for grassland fire monitoring and prevention.展开更多
Obesity is a critical health condition that severely affects an individual’s quality of life andwell-being.The occurrence of obesity is strongly associated with extreme health conditions,such as cardiac diseases,diab...Obesity is a critical health condition that severely affects an individual’s quality of life andwell-being.The occurrence of obesity is strongly associated with extreme health conditions,such as cardiac diseases,diabetes,hypertension,and some types of cancer.Therefore,it is vital to avoid obesity and or reverse its occurrence.Incorporating healthy food habits and an active lifestyle can help to prevent obesity.In this regard,artificial intelligence(AI)can play an important role in estimating health conditions and detecting obesity and its types.This study aims to see obesity levels in adults by implementing AIenabled machine learning on a real-life dataset.This dataset is in the form of electronic health records(EHR)containing data on several aspects of daily living,such as dietary habits,physical conditions,and lifestyle variables for various participants with different health conditions(underweight,normal,overweight,and obesity type I,II and III),expressed in terms of a variety of features or parameters,such as physical condition,food intake,lifestyle and mode of transportation.Three classifiers,i.e.,eXtreme gradient boosting classifier(XGB),support vector machine(SVM),and artificial neural network(ANN),are implemented to detect the status of several conditions,including obesity types.The findings indicate that the proposed XGB-based system outperforms the existing obesity level estimation methods,achieving overall performance rates of 98.5%and 99.6%in the scenarios explored.展开更多
Alzheimer’s disease is a non-reversible,non-curable,and progressive neurological disorder that induces the shrinkage and death of a specific neuronal population associated with memory formation and retention.It is a ...Alzheimer’s disease is a non-reversible,non-curable,and progressive neurological disorder that induces the shrinkage and death of a specific neuronal population associated with memory formation and retention.It is a frequently occurring mental illness that occurs in about 60%–80%of cases of dementia.It is usually observed between people in the age group of 60 years and above.Depending upon the severity of symptoms the patients can be categorized in Cognitive Normal(CN),Mild Cognitive Impairment(MCI)and Alzheimer’s Disease(AD).Alzheimer’s disease is the last phase of the disease where the brain is severely damaged,and the patients are not able to live on their own.Radiomics is an approach to extracting a huge number of features from medical images with the help of data characterization algorithms.Here,105 number of radiomic features are extracted and used to predict the alzhimer’s.This paper uses Support Vector Machine,K-Nearest Neighbour,Gaussian Naïve Bayes,eXtreme Gradient Boosting(XGBoost)and Random Forest to predict Alzheimer’s disease.The proposed random forest-based approach with the Radiomic features achieved an accuracy of 85%.This proposed approach also achieved 88%accuracy,88%recall,88%precision and 87%F1-score for AD vs.CN,it achieved 72%accuracy,73%recall,72%precisionand 71%F1-score for AD vs.MCI and it achieved 69%accuracy,69%recall,68%precision and 69%F1-score for MCI vs.CN.The comparative analysis shows that the proposed approach performs better than others approaches.展开更多
文摘Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. The above statement holds for West Texas, Midland, and Odessa Precisely. Two machine learning regression algorithms (Random Forest and XGBoost) were employed to develop models for the prediction of total dissolved solids (TDS) and sodium absorption ratio (SAR) for efficient water quality monitoring of two vital aquifers: Edward-Trinity (plateau), and Ogallala aquifers. These two aquifers have contributed immensely to providing water for different uses ranging from domestic, agricultural, industrial, etc. The data was obtained from the Texas Water Development Board (TWDB). The XGBoost and Random Forest models used in this study gave an accurate prediction of observed data (TDS and SAR) for both the Edward-Trinity (plateau) and Ogallala aquifers with the R<sup>2</sup> values consistently greater than 0.83. The Random Forest model gave a better prediction of TDS and SAR concentration with an average R, MAE, RMSE and MSE of 0.977, 0.015, 0.029 and 0.00, respectively. For the XGBoost, an average R, MAE, RMSE, and MSE of 0.953, 0.016, 0.037 and 0.00, respectively, were achieved. The overall performance of the models produced was impressive. From this study, we can clearly understand that Random Forest and XGBoost are appropriate for water quality prediction and monitoring in an area of high hydrocarbon activities like Midland and Odessa and West Texas at large.
文摘The Sentinel-2 satellites are providing an unparalleled wealth of high-resolution remotely sensed information with a short revisit cycle, which is ideal for mapping burned areas both accurately and timely. This paper proposes an automated methodology for mapping burn scars using pairs of Sentinel-2 imagery, exploiting the state-of-the-art eXtreme Gradient Boosting (XGB) machine learning framework. A large database of 64 reference wildfire perimeters in Greece from 2016 to 2019 is used to train the classifier. An empirical methodology for appropriately sampling the training patterns from this database is formulated, which guarantees the effectiveness of the approach and its computational efficiency. A difference (pre-fire minus post-fire) spectral index is used for this purpose, upon which we appropriately identify the clear and fuzzy value ranges. To reduce the data volume, a super-pixel segmentation of the images is also employed, implemented via the QuickShift algorithm. The cross-validation results showcase the effectiveness of the proposed algorithm, with the average commission and omission errors being 9% and 2%, respectively, and the average Matthews correlation coefficient (MCC) equal to 0.93.
基金financial support from High-end Foreign Expert Introduction program(No.G20190022002)Chongqing Construction Science and Technology Plan Project(2019-0045)as well as Chongqing Engineering Research Center of Disaster Prevention&Control for Banks and Structures in Three Gorges Reservoir Area(Nos.SXAPGC18ZD01 and SXAPGC18YB03)。
文摘Accurate assessment of undrained shear strength(USS)for soft sensitive clays is a great concern in geotechnical engineering practice.This study applies novel data-driven extreme gradient boosting(XGBoost)and random forest(RF)ensemble learning methods for capturing the relationships between the USS and various basic soil parameters.Based on the soil data sets from TC304 database,a general approach is developed to predict the USS of soft clays using the two machine learning methods above,where five feature variables including the preconsolidation stress(PS),vertical effective stress(VES),liquid limit(LL),plastic limit(PL)and natural water content(W)are adopted.To reduce the dependence on the rule of thumb and inefficient brute-force search,the Bayesian optimization method is applied to determine the appropriate model hyper-parameters of both XGBoost and RF.The developed models are comprehensively compared with three comparison machine learning methods and two transformation models with respect to predictive accuracy and robustness under 5-fold cross-validation(CV).It is shown that XGBoost-based and RF-based methods outperform these approaches.Besides,the XGBoostbased model provides feature importance ranks,which makes it a promising tool in the prediction of geotechnical parameters and enhances the interpretability of model.
基金This study was jointly supported by the National Natural Science Foundation of China(Nos.51879196,51790533,51709143)Jiangxi Natural Science Foundation of China(No.20181BAB206045).
文摘It is important for regional water resources management to know the agricultural water consumption information several months in advance.Forecasting reference evapotranspiration(ET_(0))in the next few months is important for irrigation and reservoir management.Studies on forecasting of multiple-month ahead ET_(0) using machine learning models have not been reported yet.Besides,machine learning models such as the XGBoost model has multiple parameters that need to be tuned,and traditional methods can get stuck in a regional optimal solution and fail to obtain a global optimal solution.This study investigated the performance of the hybrid extreme gradient boosting(XGBoost)model coupled with the Grey Wolf Optimizer(GWO)algorithm for forecasting multi-step ahead ET_(0)(1-3 months ahead),compared with three conventional machine learning models,i.e.,standalone XGBoost,multi-layer perceptron(MLP)and M5 model tree(M5)models in the subtropical zone of China.The results showed that theGWO-XGB model generally performed better than the other three machine learning models in forecasting 1-3 months ahead ET_(0),followed by the XGB,M5 and MLP models with very small differences among the three models.The GWO-XGB model performed best in autumn,while the MLP model performed slightly better than the other three models in summer.It is thus suggested to apply the MLP model for ET_(0) forecasting in summer but use the GWO-XGB model in other seasons.
文摘Complex modulus(G^(*))is one of the important criteria for asphalt classification according to AASHTO M320-10,and is often used to predict the linear viscoelastic behavior of asphalt binders.In addition,phase angle(φ)characterizes the deformation resilience of asphalt and is used to assess the ratio between the viscous and elastic components.It is thus important to quickly and accurately estimate these two indicators.The purpose of this investigation is to construct an extreme gradient boosting(XGB)model to predict G^(*)andφof graphene oxide(GO)modified asphaltat medium and high temperatures.Two data sets are gathered from previously published experiments,consisting of 357 samples for G^(*)and 339 samples forφ,and the se are used to develop the XGB model using nine inputs representing theasphalt binder components.The findings show that XGB is an excellent predictor of G^(*)andφof GO-modified asphalt,evaluated by the coefficient of determination R^(2)(R^(2)=0.990 and 0.9903 for G^(*)andφ,respectively)and root mean square error(RMSE=31.499 and 1.08 for G^(*)andφ,respectively).In addition,the model’s performance is compared with experimental results and five other machine learning(ML)models to highlight its accuracy.In the final step,the Shapley additive explanations(SHAP)value analysis is conducted to assess the impact of each input and the correlation between pairs of important features on asphalt’s two physical properties.
基金The National Natural Science Foundation of China(No.52361165658,52378318,52078459).
文摘To enhance the accuracy and efficiency of bridge damage identification,a novel data-driven damage identification method was proposed.First,convolutional autoencoder(CAE)was used to extract key features from the acceleration signal of the bridge structure through data reconstruction.The extreme gradient boosting tree(XGBoost)was then used to perform analysis on the feature data to achieve damage detection with high accuracy and high performance.The proposed method was applied in a numerical simulation study on a three-span continuous girder and further validated experimentally on a scaled model of a cable-stayed bridge.The numerical simulation results show that the identification errors remain within 2.9%for six single-damage cases and within 3.1%for four double-damage cases.The experimental validation results demonstrate that when the tension in a single cable of the cable-stayed bridge decreases by 20%,the method accurately identifies damage at different cable locations using only sensors installed on the main girder,achieving identification accuracies above 95.8%in all cases.The proposed method shows high identification accuracy and generalization ability across various damage scenarios.
基金funding provided by the China Scholarship Council (Nos.202008440524 and 202006370006)supported by the Distinguished Youth Science Foundation of Hunan Province of China (No.2022JJ10073)+1 种基金Innovation Driven Project of Central South University (No.2020CX040)Shenzhen Sciencee and Technology Plan (No.JCYJ20190808123013260).
文摘Concrete is the most commonly used construction material.However,its production leads to high carbon dioxide(CO_(2))emissions and energy consumption.Therefore,developing waste-substitutable concrete components is necessary.Improving the sustainability and greenness of concrete is the focus of this research.In this regard,899 data points were collected from existing studies where cement,slag,fly ash,superplasticizer,coarse aggregate,and fine aggregate were considered potential influential factors.The complex relationship between influential factors and concrete compressive strength makes the prediction and estimation of compressive strength difficult.Instead of the traditional compressive strength test,this study combines five novel metaheuristic algorithms with extreme gradient boosting(XGB)to predict the compressive strength of green concrete based on fly ash and blast furnace slag.The intelligent prediction models were assessed using the root mean square error(RMSE),coefficient of determination(R^(2)),mean absolute error(MAE),and variance accounted for(VAF).The results indicated that the squirrel search algorithm-extreme gradient boosting(SSA-XGB)yielded the best overall prediction performance with R^(2) values of 0.9930 and 0.9576,VAF values of 99.30 and 95.79,MAE values of 0.52 and 2.50,RMSE of 1.34 and 3.31 for the training and testing sets,respectively.The remaining five prediction methods yield promising results.Therefore,the developed hybrid XGB model can be introduced as an accurate and fast technique for the performance prediction of green concrete.Finally,the developed SSA-XGB considered the effects of all the input factors on the compressive strength.The ability of the model to predict the performance of concrete with unknown proportions can play a significant role in accelerating the development and application of sustainable concrete and furthering a sustainable economy.
基金supported by the“Microsoft Grant Award:AI for Health COVID-19″The RISC-19-ICU reg-istry is supported by the Swiss Society of Intensive Care Medicine and funded by internal resources of the Institute of Intensive Care Medicine,of the University Hospital Zurich and by unrestricted grants from CytoSorbents Europe GmbH(Berlin,Germany)+1 种基金Union Bancaire Privée(Zurich,Switzerland)The sponsors had no role in the design of the study,the collection and analysis of the data,or the preparation of the manuscript.
文摘Background:Accurate risk stratification of critically ill patients with coronavirus disease 2019(COVID-19)is essential for optimizing resource allocation,delivering targeted interventions,and maximizing patient survival probability.Machine learning(ML)techniques are attracting increased interest for the development of prediction models as they excel in the analysis of complex signals in data-rich environments such as critical care.Methods:We retrieved data on patients with COVID-19 admitted to an intensive care unit(ICU)between March and October 2020 from the RIsk Stratification in COVID-19 patients in the Intensive Care Unit(RISC-19-ICU)registry.We applied the Extreme Gradient Boosting(XGBoost)algorithm to the data to predict as a binary out-come the increase or decrease in patients’Sequential Organ Failure Assessment(SOFA)score on day 5 after ICU admission.The model was iteratively cross-validated in different subsets of the study cohort.Results:The final study population consisted of 675 patients.The XGBoost model correctly predicted a decrease in SOFA score in 320/385(83%)critically ill COVID-19 patients,and an increase in the score in 210/290(72%)patients.The area under the mean receiver operating characteristic curve for XGBoost was significantly higher than that for the logistic regression model(0.86 vs.0.69,P<0.01[paired t-test with 95%confidence interval]).Conclusions:The XGBoost model predicted the change in SOFA score in critically ill COVID-19 patients admitted to the ICU and can guide clinical decision support systems(CDSSs)aimed at optimizing available resources.
基金financially supported by the National Natural Science Foundation of China(Nos.51974023 and 52374321)the funding of State Key Laboratory of Advanced Metallurgy,University of Science and Technology Beijing(No.41621005)the Youth Science and Technology Innovation Fund of Jianlong Group-University of Science and Technology Beijing(No.20231235).
文摘Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing models to predict molten steel temperature has been conducted.However,most researchers focus solely on improving the accuracy of the model,neglecting its explainability.The present study aims to develop a high-precision and explainable model with improved reliability and transparency.The eXtreme gradient boosting(XGBoost)and light gradient boosting machine(LGBM)were utilized,along with bayesian optimization and grey wolf optimiz-ation(GWO),to establish the prediction model.Different performance evaluation metrics and graphical representations were applied to compare the optimal XGBoost and LGBM models obtained through varying hyperparameter optimization methods with the other models.The findings indicated that the GWO-LGBM model outperformed other methods in predicting molten steel temperature,with a high pre-diction accuracy of 89.35%within the error range of±5°C.The model’s learning/decision process was revealed,and the influence degree of different variables on the molten steel temperature was clarified using the tree structure visualization and SHapley Additive exPlana-tions(SHAP)analysis.Consequently,the explainability of the optimal GWO-LGBM model was enhanced,providing reliable support for prediction results.
基金The project supported by Key Laboratory of Space Ocean Remote Sensing and Application,Ministry of Natural Resources under contract No.2023CFO016the National Natural Science Foundation of China under contract No.61931025+1 种基金the Innovation Fund Project for Graduate Student of China University of Petroleum(East China)the Fundamental Research Funds for the Central Universities under contract No.23CX04042A.
文摘Synthetic aperture radar(SAR)and wave spectrometers,crucial in microwave remote sensing,play an essential role in monitoring sea surface wind and wave conditions.However,they face inherent limitations in observing sea surface phenomena.SAR systems,for instance,are hindered by an azimuth cut-off phenomenon in sea surface wind field observation.Wave spectrometers,while unaffected by the azimuth cutoff phenomenon,struggle with low azimuth resolution,impacting the capture of detailed wave and wind field data.This study utilizes SAR and surface wave investigation and monitoring(SWIM)data to initially extract key feature parameters,which are then prioritized using the extreme gradient boosting(XGBoost)algorithm.The research further addresses feature collinearity through a combined analysis of feature importance and correlation,leading to the development of an inversion model for wave and wind parameters based on XGBoost.A comparative analysis of this model with ERA5 reanalysis and buoy data for of significant wave height,mean wave period,wind direction,and wind speed reveals root mean square errors of 0.212 m,0.525 s,27.446°,and 1.092 m/s,compared to 0.314 m,0.888 s,27.698°,and 1.315 m/s from buoy data,respectively.These results demonstrate the model’s effective retrieval of wave and wind parameters.Finally,the model,incorporating altimeter and scatterometer data,is evaluated against SAR/SWIM single and dual payload inversion methods across different wind speeds.This comparison highlights the model’s superior inversion accuracy over other methods.
基金This work was financially supported by National Natural Science Foundation of China(41972262)Hebei Natural Science Foundation for Excellent Young Scholars(D2020504032)+1 种基金Central Plains Science and technology innovation leader Project(214200510030)Key research and development Project of Henan province(221111321500).
文摘Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-driven or statistical model and its assessment results are subjective,difficult to quantify,and no pertinence.As a new research method for landslide susceptibility assessment,machine learning can greatly improve the landslide susceptibility model’s accuracy by constructing statistical models.Taking Western Henan for example,the study selected 16 landslide influencing factors such as topography,geological environment,hydrological conditions,and human activities,and 11 landslide factors with the most significant influence on the landslide were selected by the recursive feature elimination(RFE)method.Five machine learning methods[Support Vector Machines(SVM),Logistic Regression(LR),Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Linear Discriminant Analysis(LDA)]were used to construct the spatial distribution model of landslide susceptibility.The models were evaluated by the receiver operating characteristic curve and statistical index.After analysis and comparison,the XGBoost model(AUC 0.8759)performed the best and was suitable for dealing with regression problems.The model had a high adaptability to landslide data.According to the landslide susceptibility map of the five models,the overall distribution can be observed.The extremely high and high susceptibility areas are distributed in the Funiu Mountain range in the southwest,the Xiaoshan Mountain range in the west,and the Yellow River Basin in the north.These areas have large terrain fluctuations,complicated geological structural environments and frequent human engineering activities.The extremely high and highly prone areas were 12043.3 km^(2)and 3087.45 km^(2),accounting for 47.61%and 12.20%of the total area of the study area,respectively.Our study reflects the distribution of landslide susceptibility in western Henan Province,which provides a scientific basis for regional disaster warning,prediction,and resource protection.The study has important practical significance for subsequent landslide disaster management.
基金supported by National key research and development program of China(2022YFA1503101)National Natural Science Foundation of China(22173067,22203058)+4 种基金Science and Technology Project of Jiangsu Province(BK20200873,BZ2020011)the Science and Technology Development Fund,Macao SAR(0052/2021/A)Collaborative Innovation Center of Suzhou Nano Science&Technology,the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)the 111 ProjectJoint International Research Laboratory of Carbon-Based Functional Materials and Devices。
文摘Lithium-sulfur(Li-S)batteries are notable for their high theoretical energy density,but the‘shuttle effect’and the limited conversion kinetics of Li-S species can downgrade their actual performance.An essential strategy is to design anchoring materials(AMs)to appropriately adsorb Li-S species.Herein,we propose a new three-procedure protocol,named InfoAd(Informative Adsorption)to evaluate the anchoring of Li_(2)S on two-dimensional(2D)materials and disclose the underlying importance of material features by combining high-throughput calculation workflow and machine learning(ML).In this paradigm,we calculate the anchoring of Li_(2)S on 12552D A_(x)B_(y)(B in the VIA/VIIA group)materials and pick out 44(un)reported nontoxic 2D binary A_(x)B_(y)AMs,in which the importance of the geometric features on the anchoring effect is revealed by ML for the first time.We develop a new Infograph model for crystals to accurately predict whether a material has a moderate binding with Li_(2)S and extend it to all 2D materials.Our InfoAd protocol elucidates the underlying structure-property relationship of Li_(2)S adsorption on 2D materials and provides a general research framework of adsorption-related materials for catalysis and energy/substance storage.
基金funded by the National Key Research and Development Program of China Strategic International Cooperation in Science and Technology Innovation Program (2018YFE0207800)the National Natural Science Foundation of China (31971483)。
文摘The dead fuel moisture content(DFMC)is the key driver leading to fire occurrence.Accurately estimating the DFMC could help identify locations facing fire risks,prioritise areas for fire monitoring,and facilitate timely deployment of fire-suppression resources.In this study,the DFMC and environmental variables,including air temperature,relative humidity,wind speed,solar radiation,rainfall,atmospheric pressure,soil temperature,and soil humidity,were simultaneously measured in a grassland of Ergun City,Inner Mongolia Autonomous Region of China in 2021.We chose three regression models,i.e.,random forest(RF)model,extreme gradient boosting(XGB)model,and boosted regression tree(BRT)model,to model the seasonal DFMC according to the data collected.To ensure accuracy,we added time-lag variables of 3 d to the models.The results showed that the RF model had the best fitting effect with an R2value of 0.847 and a prediction accuracy with a mean absolute error score of 4.764%among the three models.The accuracies of the models in spring and autumn were higher than those in the other two seasons.In addition,different seasons had different key influencing factors,and the degree of influence of these factors on the DFMC changed with time lags.Moreover,time-lag variables within 44 h clearly improved the fitting effect and prediction accuracy,indicating that environmental conditions within approximately 48 h greatly influence the DFMC.This study highlights the importance of considering 48 h time-lagged variables when predicting the DFMC of grassland fuels and mapping grassland fire risks based on the DFMC to help locate high-priority areas for grassland fire monitoring and prevention.
基金The authors would like to acknowledge the support of the Deputy for Research and Innovation-Ministry of Education,Kingdom of Saudi Arabia,for this research through a grant(NU/IFC/ENT/01/020)under the Institutional Funding Committee at Najran University,Kingdom of Saudi Arabia.
文摘Obesity is a critical health condition that severely affects an individual’s quality of life andwell-being.The occurrence of obesity is strongly associated with extreme health conditions,such as cardiac diseases,diabetes,hypertension,and some types of cancer.Therefore,it is vital to avoid obesity and or reverse its occurrence.Incorporating healthy food habits and an active lifestyle can help to prevent obesity.In this regard,artificial intelligence(AI)can play an important role in estimating health conditions and detecting obesity and its types.This study aims to see obesity levels in adults by implementing AIenabled machine learning on a real-life dataset.This dataset is in the form of electronic health records(EHR)containing data on several aspects of daily living,such as dietary habits,physical conditions,and lifestyle variables for various participants with different health conditions(underweight,normal,overweight,and obesity type I,II and III),expressed in terms of a variety of features or parameters,such as physical condition,food intake,lifestyle and mode of transportation.Three classifiers,i.e.,eXtreme gradient boosting classifier(XGB),support vector machine(SVM),and artificial neural network(ANN),are implemented to detect the status of several conditions,including obesity types.The findings indicate that the proposed XGB-based system outperforms the existing obesity level estimation methods,achieving overall performance rates of 98.5%and 99.6%in the scenarios explored.
文摘Alzheimer’s disease is a non-reversible,non-curable,and progressive neurological disorder that induces the shrinkage and death of a specific neuronal population associated with memory formation and retention.It is a frequently occurring mental illness that occurs in about 60%–80%of cases of dementia.It is usually observed between people in the age group of 60 years and above.Depending upon the severity of symptoms the patients can be categorized in Cognitive Normal(CN),Mild Cognitive Impairment(MCI)and Alzheimer’s Disease(AD).Alzheimer’s disease is the last phase of the disease where the brain is severely damaged,and the patients are not able to live on their own.Radiomics is an approach to extracting a huge number of features from medical images with the help of data characterization algorithms.Here,105 number of radiomic features are extracted and used to predict the alzhimer’s.This paper uses Support Vector Machine,K-Nearest Neighbour,Gaussian Naïve Bayes,eXtreme Gradient Boosting(XGBoost)and Random Forest to predict Alzheimer’s disease.The proposed random forest-based approach with the Radiomic features achieved an accuracy of 85%.This proposed approach also achieved 88%accuracy,88%recall,88%precision and 87%F1-score for AD vs.CN,it achieved 72%accuracy,73%recall,72%precisionand 71%F1-score for AD vs.MCI and it achieved 69%accuracy,69%recall,68%precision and 69%F1-score for MCI vs.CN.The comparative analysis shows that the proposed approach performs better than others approaches.