Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing...Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing models to predict molten steel temperature has been conducted.However,most researchers focus solely on improving the accuracy of the model,neglecting its explainability.The present study aims to develop a high-precision and explainable model with improved reliability and transparency.The eXtreme gradient boosting(XGBoost)and light gradient boosting machine(LGBM)were utilized,along with bayesian optimization and grey wolf optimiz-ation(GWO),to establish the prediction model.Different performance evaluation metrics and graphical representations were applied to compare the optimal XGBoost and LGBM models obtained through varying hyperparameter optimization methods with the other models.The findings indicated that the GWO-LGBM model outperformed other methods in predicting molten steel temperature,with a high pre-diction accuracy of 89.35%within the error range of±5°C.The model’s learning/decision process was revealed,and the influence degree of different variables on the molten steel temperature was clarified using the tree structure visualization and SHapley Additive exPlana-tions(SHAP)analysis.Consequently,the explainability of the optimal GWO-LGBM model was enhanced,providing reliable support for prediction results.展开更多
Colletotrichum kahawae(Coffee Berry Disease)spreads through spores that can be carried by wind,rain,and insects affecting coffee plantations,and causes 80%yield losses and poor-quality coffee beans.The deadly disease ...Colletotrichum kahawae(Coffee Berry Disease)spreads through spores that can be carried by wind,rain,and insects affecting coffee plantations,and causes 80%yield losses and poor-quality coffee beans.The deadly disease is hard to control because wind,rain,and insects carry spores.Colombian researchers utilized a deep learning system to identify CBD in coffee cherries at three growth stages and classify photographs of infected and uninfected cherries with 93%accuracy using a random forest method.If the dataset is too small and noisy,the algorithm may not learn data patterns and generate accurate predictions.To overcome the existing challenge,early detection of Colletotrichum Kahawae disease in coffee cherries requires automated processes,prompt recognition,and accurate classifications.The proposed methodology selects CBD image datasets through four different stages for training and testing.XGBoost to train a model on datasets of coffee berries,with each image labeled as healthy or diseased.Once themodel is trained,SHAP algorithmto figure out which features were essential formaking predictions with the proposed model.Some of these characteristics were the cherry’s colour,whether it had spots or other damage,and how big the Lesions were.Virtual inception is important for classification to virtualize the relationship between the colour of the berry is correlated with the presence of disease.To evaluate themodel’s performance andmitigate excess fitting,a 10-fold cross-validation approach is employed.This involves partitioning the dataset into ten subsets,training the model on each subset,and evaluating its performance.In comparison to other contemporary methodologies,the model put forth achieved an accuracy of 98.56%.展开更多
Today,urban traffic,growing populations,and dense transportation networks are contributing to an increase in traffic incidents.These incidents include traffic accidents,vehicle breakdowns,fires,and traffic disputes,re...Today,urban traffic,growing populations,and dense transportation networks are contributing to an increase in traffic incidents.These incidents include traffic accidents,vehicle breakdowns,fires,and traffic disputes,resulting in long waiting times,high carbon emissions,and other undesirable situations.It is vital to estimate incident response times quickly and accurately after traffic incidents occur for the success of incident-related planning and response activities.This study presents a model for forecasting the traffic incident duration of traffic events with high precision.The proposed model goes through a 4-stage process using various features to predict the duration of four different traffic events and presents a feature reduction approach to enable real-time data collection and prediction.In the first stage,the dataset consisting of 24,431 data points and 75 variables is prepared by data collection,merging,missing data processing and data cleaning.In the second stage,models such as Decision Trees(DT),K-Nearest Neighbour(KNN),Random Forest(RF)and Support Vector Machines(SVM)are used and hyperparameter optimisation is performed with GridSearchCV.In the third stage,feature selection and reduction are performed and real-time data are used.In the last stage,model performance with 14 variables is evaluated with metrics such as accuracy,precision,recall,F1-score,MCC,confusion matrix and SHAP.The RF model outperforms other models with an accuracy of 98.5%.The study’s prediction results demonstrate that the proposed dynamic prediction model can achieve a high level of success.展开更多
Boosting algorithms have been widely utilized in the development of landslide susceptibility mapping(LSM)studies.However,these algorithms possess distinct computational strategies and hyperparameters,making it challen...Boosting algorithms have been widely utilized in the development of landslide susceptibility mapping(LSM)studies.However,these algorithms possess distinct computational strategies and hyperparameters,making it challenging to propose an ideal LSM model.To investigate the impact of different boosting algorithms and hyperparameter optimization algorithms on LSM,this study constructed a geospatial database comprising 12 conditioning factors,such as elevation,stratum,and annual average rainfall.The XGBoost(XGB),LightGBM(LGBM),and CatBoost(CB)algorithms were employed to construct the LSM model.Furthermore,the Bayesian optimization(BO),particle swarm optimization(PSO),and Hyperband optimization(HO)algorithms were applied to optimizing the LSM model.The boosting algorithms exhibited varying performances,with CB demonstrating the highest precision,followed by LGBM,and XGB showing poorer precision.Additionally,the hyperparameter optimization algorithms displayed different performances,with HO outperforming PSO and BO showing poorer performance.The HO-CB model achieved the highest precision,boasting an accuracy of 0.764,an F1-score of 0.777,an area under the curve(AUC)value of 0.837 for the training set,and an AUC value of 0.863 for the test set.The model was interpreted using SHapley Additive exPlanations(SHAP),revealing that slope,curvature,topographic wetness index(TWI),degree of relief,and elevation significantly influenced landslides in the study area.This study offers a scientific reference for LSM and disaster prevention research.This study examines the utilization of various boosting algorithms and hyperparameter optimization algorithms in Wanzhou District.It proposes the HO-CB-SHAP framework as an effective approach to accurately forecast landslide disasters and interpret LSM models.However,limitations exist concerning the generalizability of the model and the data processing,which require further exploration in subsequent studies.展开更多
Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the sett...Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。展开更多
Ultrasonic testing(UT)is increasingly combined with machine learning(ML)techniques for intelligently identifying damage.Extracting signifcant features from UT data is essential for efcient defect characterization.More...Ultrasonic testing(UT)is increasingly combined with machine learning(ML)techniques for intelligently identifying damage.Extracting signifcant features from UT data is essential for efcient defect characterization.Moreover,the hidden physics behind ML is unexplained,reducing the generalization capability and versatility of ML methods in UT.In this paper,a generally applicable ML framework based on the model interpretation strategy is proposed to improve the detection accuracy and computational efciency of UT.Firstly,multi-domain features are extracted from the UT signals with signal processing techniques to construct an initial feature space.Subsequently,a feature selection method based on model interpretable strategy(FS-MIS)is innovatively developed by integrating Shapley additive explanation(SHAP),flter method,embedded method and wrapper method.The most efective ML model and the optimal feature subset with better correlation to the target defects are determined self-adaptively.The proposed framework is validated by identifying and locating side-drilled holes(SDHs)with 0.5λcentral distance and different depths.An ultrasonic array probe is adopted to acquire FMC datasets from several aluminum alloy specimens containing two SDHs by experiments.The optimal feature subset selected by FS-MIS is set as the input of the chosen ML model to train and predict the times of arrival(ToAs)of the scattered waves emitted by adjacent SDHs.The experimental results demonstrate that the relative errors of the predicted ToAs are all below 3.67%with an average error of 0.25%,signifcantly improving the time resolution of UT signals.On this basis,the predicted ToAs are assigned to the corresponding original signals for decoupling overlapped pulse-echoes and reconstructing high-resolution FMC datasets.The imaging resolution is enhanced to 0.5λby implementing the total focusing method(TFM).The relative errors of hole depths and central distance are no more than 0.51%and 3.57%,respectively.Finally,the superior performance of the proposed FS-MIS is validated by comparing it with initial feature space and conventional dimensionality reduction techniques.展开更多
Cybersecurity increasingly relies on machine learning(ML)models to respond to and detect attacks.However,the rapidly changing data environment makes model life-cycle management after deployment essential.Real-time det...Cybersecurity increasingly relies on machine learning(ML)models to respond to and detect attacks.However,the rapidly changing data environment makes model life-cycle management after deployment essential.Real-time detection of drift signals from various threats is fundamental for effectively managing deployed models.However,detecting drift in unsupervised environments can be challenging.This study introduces a novel approach leveraging Shapley additive explanations(SHAP),a widely recognized explainability technique in ML,to address drift detection in unsupervised settings.The proposed method incorporates a range of plots and statistical techniques to enhance drift detection reliability and introduces a drift suspicion metric that considers the explanatory aspects absent in the current approaches.To validate the effectiveness of the proposed approach in a real-world scenario,we applied it to an environment designed to detect domain generation algorithms(DGAs).The dataset was obtained from various types of DGAs provided by NetLab.Based on this dataset composition,we sought to validate the proposed SHAP-based approach through drift scenarios that occur when a previously deployed model detects new data types in an environment that detects real-world DGAs.The results revealed that more than 90%of the drift data exceeded the threshold,demonstrating the high reliability of the approach to detect drift in an unsupervised environment.The proposed method distinguishes itself fromexisting approaches by employing explainable artificial intelligence(XAI)-based detection,which is not limited by model or system environment constraints.In conclusion,this paper proposes a novel approach to detect drift in unsupervised ML settings for cybersecurity.The proposed method employs SHAP-based XAI and a drift suspicion metric to improve drift detection reliability.It is versatile and suitable for various realtime data analysis contexts beyond DGA detection environments.This study significantly contributes to theMLcommunity by addressing the critical issue of managing ML models in real-world cybersecurity settings.Our approach is distinguishable from existing techniques by employing XAI-based detection,which is not limited by model or system environment constraints.As a result,our method can be applied in critical domains that require adaptation to continuous changes,such as cybersecurity.Through extensive validation across diverse settings beyond DGA detection environments,the proposed method will emerge as a versatile drift detection technique suitable for a wide range of real-time data analysis contexts.It is also anticipated to emerge as a new approach to protect essential systems and infrastructures from attacks.展开更多
目的建立预测重症慢性阻塞性肺疾病(简称慢阻肺)患者死亡风险的机器学习模型,探讨与慢阻肺患者死亡风险相关的因素,并加以解释,解决机器学习模型的“黑箱”问题。方法选取美国多中心急诊重症监护病(emergency intensive care unit,eICU...目的建立预测重症慢性阻塞性肺疾病(简称慢阻肺)患者死亡风险的机器学习模型,探讨与慢阻肺患者死亡风险相关的因素,并加以解释,解决机器学习模型的“黑箱”问题。方法选取美国多中心急诊重症监护病(emergency intensive care unit,eICU)数据库中的8088例重症慢阻肺患者为研究对象,提取每次入住重症监护病房的前24 h内的数据并随机分组,70%用于模型训练,30%用于模型验证。采用LASSO回归进行预测变量选择,避免过拟合。采用5种机器学习模型对患者的住院病死率进行预测。通过曲线下面积(area under curve,AUC)比较5种模型和APACHEⅣa评分的预测性能,并采用SHAP(SHapley Additive exPlanations)方法解释随机森林(random forest,RF)模型的预测结果。结果RF模型在5种机器学习模型和APACHEⅣa评分系统中表现出最佳的性能,AUC达到0.830(95%置信区间0.806~0.855)。通过SHAP方法检测最重要的10种预测变量,其中无创收缩压的最小值被认为是最重要的预测变量。结论通过机器学习识别危险因素,并使用SHAP方法解释预测结果,可早期预测患者的死亡风险,有助于临床医生制定准确的治疗计划,合理分配医疗资源。展开更多
Landslide inventory is an indispensable output variable of landslide susceptibility prediction(LSP)modelling.However,the influence of landslide inventory incompleteness on LSP and the transfer rules of LSP resulting e...Landslide inventory is an indispensable output variable of landslide susceptibility prediction(LSP)modelling.However,the influence of landslide inventory incompleteness on LSP and the transfer rules of LSP resulting error in the model have not been explored.Adopting Xunwu County,China,as an example,the existing landslide inventory is first obtained and assumed to contain all landslide inventory samples under ideal conditions,after which different landslide inventory sample missing conditions are simulated by random sampling.It includes the condition that the landslide inventory samples in the whole study area are missing randomly at the proportions of 10%,20%,30%,40%and 50%,as well as the condition that the landslide inventory samples in the south of Xunwu County are missing in aggregation.Then,five machine learning models,namely,Random Forest(RF),and Support Vector Machine(SVM),are used to perform LSP.Finally,the LSP results are evaluated to analyze the LSP uncertainties under various conditions.In addition,this study introduces various interpretability methods of machine learning model to explore the changes in the decision basis of the RF model under various conditions.Results show that(1)randomly missing landslide inventory samples at certain proportions(10%–50%)may affect the LSP results for local areas.(2)Aggregation of missing landslide inventory samples may cause significant biases in LSP,particularly in areas where samples are missing.(3)When 50%of landslide samples are missing(either randomly or aggregated),the changes in the decision basis of the RF model are mainly manifested in two aspects:first,the importance ranking of environmental factors slightly differs;second,in regard to LSP modelling in the same test grid unit,the weights of individual model factors may drastically vary.展开更多
L1_(2)phase-strengthened Fe-Co-Ni-based high-entropy alloys(HEAs)have attracted considerable attention due to their excellent mechanical properties.Improving the properties of HEAs through conventional experimental me...L1_(2)phase-strengthened Fe-Co-Ni-based high-entropy alloys(HEAs)have attracted considerable attention due to their excellent mechanical properties.Improving the properties of HEAs through conventional experimental methods is costly.Therefore,a new method is needed to predict the properties of alloys quickly and accurately.In this study,a comprehensive prediction model for L1_(2)phase-strengthened Fe-Co-Ni-based HEAs was developed.The existence of the L1_(2)phase in the HEAs was first predicted.A link was then established between the microstructure(L1_(2)phase volume fraction)and properties(hardness)of HEAs,and comprehensive prediction was performed.Finally,two mutually exclusive properties(strength and plasticity)of HEAs were coupled and co-optimized.The Shapley additive explained algorithm was also used to interpret the contribution of each model feature to the comprehensive properties of HEAs.The vast compositional and process search space of HEAs was progressively screened in three stages by applying different prediction models.Finally,four HEAs were screened from hundreds of thousands of possible candidate groups,and the prediction results were verified by experiments.In this work,L1_(2)phase-strengthened Fe-Co-Ni-based HEAs with high strength and plasticity were successfully designed.The new method presented herein has a great cost advantage over traditional experimental methods.It is also expected to be applied in the design of HEAs with various excellent properties or to explore the potential factors affecting the microstructure/properties of alloys.展开更多
Evidence from animal experiments has shown that chlorinated polyfluoroalkyl ether sulfonic acids(Cl-PFESAs)can induce vision dysfunction in zebrafish.However,environmental epidemiological evidence supporting this hypo...Evidence from animal experiments has shown that chlorinated polyfluoroalkyl ether sulfonic acids(Cl-PFESAs)can induce vision dysfunction in zebrafish.However,environmental epidemiological evidence supporting this hypothesis remains limited.In our case−control study,samples collected from 270 individuals(135 controls and 135 cases)from the Isomers of C8 Health Project data were analyzed for Cl-PFESAs.We also repeated our analysis on zebrafish to support our findings in humans and to decipher the mechanism underlying Cl-PFESA eye toxicity.The serum levels of per-and polyfluoroalkyl substances(PFASs)and alternatives were significantly higher in the cases than in the controls.Higher serum Cl-PFESA levels were associated with greater odds of eye diseases,and the trend showed a statistically significant dose-dependent relationship.The Shapley additive explanations(SHAP)value indicated that 8:2 Cl-PFESA was the dominant eye disease risk factor among the 13 studied PFASs.In zebrafish experiments,Cl-PFESAs induced eye toxicity in adult zebrafish by oxidative damage and cell apoptosis.Compared to the control group,there was significantly reduced thicknesses of the inner plexiform layer(IPL),outer plexiform layer(OPL),and retinal tissue in the zebrafish exposed to Cl-PFESAs.Our study provides human clinical and animal experimental data,showing that exposure to PFASs increases the odds of the development of eye toxicity.展开更多
Cervical spondylotic myelopathy(CSM)is the main cause of adult spinal cord dysfunction,mostly appearing in middle-aged and elderly patients.Currently,the diagnosis of this condition depends mainly on the available ima...Cervical spondylotic myelopathy(CSM)is the main cause of adult spinal cord dysfunction,mostly appearing in middle-aged and elderly patients.Currently,the diagnosis of this condition depends mainly on the available imaging tools such as X-ray,computed tomography and magnetic resonance imaging(MRI),of which MRI is the gold standard for clinical diagnosis.However,MRI data cannot clearly demonstrate the dynamic characteristics of CSM,and the overall process is far from costefficient.Therefore,this study proposes a new method using multiple gait parameters and shallow classifiers to dynamically detect the occurrence of CSM.In the present study,45 patients with CSM and 45 age-matched asymptomatic healthy controls(HCs)were recruited,and a three-dimensional(3D)motion capture system was utilized to capture the locomotion data.Furthermore,63 spatiotemporal,kinematic,and nonlinear parameters were extracted,including lower limb joint angles in the sagittal,coronal,and transverse planes.Then,the Shapley Additive exPlanations(SHAP)value was utilized for feature selection and reduction of the dimensionality of features,and five traditional shallow classifiers,including support vector machine(SVM),logistic regression(LR),k-nearest neighbor(KNN),decision tree(DT),and random forest(RF),were used to classify gait patterns between CSM patients and HCs.On the basis of the 10-fold cross-validation method,the highest average accuracy was achieved by SVM(95.56%).Our results demonstrated that the proposed method could effectively detect CSM and thus serve as an automated auxiliary tool for the clinical diagnosis of CSM.展开更多
To extract strong correlations between different energy loads and improve the interpretability and accuracy for load forecasting of a regional integrated energy system(RIES),an explainable framework for load forecasti...To extract strong correlations between different energy loads and improve the interpretability and accuracy for load forecasting of a regional integrated energy system(RIES),an explainable framework for load forecasting of an RIES is proposed.This includes the load forecasting model of RIES and its interpretation.A coupled feature extracting strat-egy is adopted to construct coupled features between loads as the input variables of the model.It is designed based on multi-task learning(MTL)with a long short-term memory(LSTM)model as the sharing layer.Based on SHapley Additive exPlanations(SHAP),this explainable framework combines global and local interpretations to improve the interpretability of load forecasting of the RIES.In addition,an input variable selection strategy based on the global SHAP value is proposed to select input feature variables of the model.A case study is given to verify the effectiveness of the proposed model,constructed coupled features,and input variable selection strategy.The results show that the explainable framework intuitively improves the interpretability of the prediction model.展开更多
基金financially supported by the National Natural Science Foundation of China(Nos.51974023 and 52374321)the funding of State Key Laboratory of Advanced Metallurgy,University of Science and Technology Beijing(No.41621005)the Youth Science and Technology Innovation Fund of Jianlong Group-University of Science and Technology Beijing(No.20231235).
文摘Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing models to predict molten steel temperature has been conducted.However,most researchers focus solely on improving the accuracy of the model,neglecting its explainability.The present study aims to develop a high-precision and explainable model with improved reliability and transparency.The eXtreme gradient boosting(XGBoost)and light gradient boosting machine(LGBM)were utilized,along with bayesian optimization and grey wolf optimiz-ation(GWO),to establish the prediction model.Different performance evaluation metrics and graphical representations were applied to compare the optimal XGBoost and LGBM models obtained through varying hyperparameter optimization methods with the other models.The findings indicated that the GWO-LGBM model outperformed other methods in predicting molten steel temperature,with a high pre-diction accuracy of 89.35%within the error range of±5°C.The model’s learning/decision process was revealed,and the influence degree of different variables on the molten steel temperature was clarified using the tree structure visualization and SHapley Additive exPlana-tions(SHAP)analysis.Consequently,the explainability of the optimal GWO-LGBM model was enhanced,providing reliable support for prediction results.
基金support from the Deanship for Research&Innovation,Ministry of Education in Saudi Arabia,under the Auspices of Project Number:IFP22UQU4281768DSR122.
文摘Colletotrichum kahawae(Coffee Berry Disease)spreads through spores that can be carried by wind,rain,and insects affecting coffee plantations,and causes 80%yield losses and poor-quality coffee beans.The deadly disease is hard to control because wind,rain,and insects carry spores.Colombian researchers utilized a deep learning system to identify CBD in coffee cherries at three growth stages and classify photographs of infected and uninfected cherries with 93%accuracy using a random forest method.If the dataset is too small and noisy,the algorithm may not learn data patterns and generate accurate predictions.To overcome the existing challenge,early detection of Colletotrichum Kahawae disease in coffee cherries requires automated processes,prompt recognition,and accurate classifications.The proposed methodology selects CBD image datasets through four different stages for training and testing.XGBoost to train a model on datasets of coffee berries,with each image labeled as healthy or diseased.Once themodel is trained,SHAP algorithmto figure out which features were essential formaking predictions with the proposed model.Some of these characteristics were the cherry’s colour,whether it had spots or other damage,and how big the Lesions were.Virtual inception is important for classification to virtualize the relationship between the colour of the berry is correlated with the presence of disease.To evaluate themodel’s performance andmitigate excess fitting,a 10-fold cross-validation approach is employed.This involves partitioning the dataset into ten subsets,training the model on each subset,and evaluating its performance.In comparison to other contemporary methodologies,the model put forth achieved an accuracy of 98.56%.
文摘Today,urban traffic,growing populations,and dense transportation networks are contributing to an increase in traffic incidents.These incidents include traffic accidents,vehicle breakdowns,fires,and traffic disputes,resulting in long waiting times,high carbon emissions,and other undesirable situations.It is vital to estimate incident response times quickly and accurately after traffic incidents occur for the success of incident-related planning and response activities.This study presents a model for forecasting the traffic incident duration of traffic events with high precision.The proposed model goes through a 4-stage process using various features to predict the duration of four different traffic events and presents a feature reduction approach to enable real-time data collection and prediction.In the first stage,the dataset consisting of 24,431 data points and 75 variables is prepared by data collection,merging,missing data processing and data cleaning.In the second stage,models such as Decision Trees(DT),K-Nearest Neighbour(KNN),Random Forest(RF)and Support Vector Machines(SVM)are used and hyperparameter optimisation is performed with GridSearchCV.In the third stage,feature selection and reduction are performed and real-time data are used.In the last stage,model performance with 14 variables is evaluated with metrics such as accuracy,precision,recall,F1-score,MCC,confusion matrix and SHAP.The RF model outperforms other models with an accuracy of 98.5%.The study’s prediction results demonstrate that the proposed dynamic prediction model can achieve a high level of success.
基金funded by the Natural Science Foundation of Chongqing(Grants No.CSTB2022NSCQ-MSX0594)the Humanities and Social Sciences Research Project of the Ministry of Education(Grants No.16YJCZH061).
文摘Boosting algorithms have been widely utilized in the development of landslide susceptibility mapping(LSM)studies.However,these algorithms possess distinct computational strategies and hyperparameters,making it challenging to propose an ideal LSM model.To investigate the impact of different boosting algorithms and hyperparameter optimization algorithms on LSM,this study constructed a geospatial database comprising 12 conditioning factors,such as elevation,stratum,and annual average rainfall.The XGBoost(XGB),LightGBM(LGBM),and CatBoost(CB)algorithms were employed to construct the LSM model.Furthermore,the Bayesian optimization(BO),particle swarm optimization(PSO),and Hyperband optimization(HO)algorithms were applied to optimizing the LSM model.The boosting algorithms exhibited varying performances,with CB demonstrating the highest precision,followed by LGBM,and XGB showing poorer precision.Additionally,the hyperparameter optimization algorithms displayed different performances,with HO outperforming PSO and BO showing poorer performance.The HO-CB model achieved the highest precision,boasting an accuracy of 0.764,an F1-score of 0.777,an area under the curve(AUC)value of 0.837 for the training set,and an AUC value of 0.863 for the test set.The model was interpreted using SHapley Additive exPlanations(SHAP),revealing that slope,curvature,topographic wetness index(TWI),degree of relief,and elevation significantly influenced landslides in the study area.This study offers a scientific reference for LSM and disaster prevention research.This study examines the utilization of various boosting algorithms and hyperparameter optimization algorithms in Wanzhou District.It proposes the HO-CB-SHAP framework as an effective approach to accurately forecast landslide disasters and interpret LSM models.However,limitations exist concerning the generalizability of the model and the data processing,which require further exploration in subsequent studies.
基金support provided by The Science and Technology Development Fund,Macao SAR,China(File Nos.0057/2020/AGJ and SKL-IOTSC-2021-2023)Science and Technology Program of Guangdong Province,China(Grant No.2021A0505080009).
文摘Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。
基金Supported by National Natural Science Foundation of China(Grant Nos.U22B2068,52275520,52075078)National Key Research and Development Program of China(Grant No.2019YFA0709003).
文摘Ultrasonic testing(UT)is increasingly combined with machine learning(ML)techniques for intelligently identifying damage.Extracting signifcant features from UT data is essential for efcient defect characterization.Moreover,the hidden physics behind ML is unexplained,reducing the generalization capability and versatility of ML methods in UT.In this paper,a generally applicable ML framework based on the model interpretation strategy is proposed to improve the detection accuracy and computational efciency of UT.Firstly,multi-domain features are extracted from the UT signals with signal processing techniques to construct an initial feature space.Subsequently,a feature selection method based on model interpretable strategy(FS-MIS)is innovatively developed by integrating Shapley additive explanation(SHAP),flter method,embedded method and wrapper method.The most efective ML model and the optimal feature subset with better correlation to the target defects are determined self-adaptively.The proposed framework is validated by identifying and locating side-drilled holes(SDHs)with 0.5λcentral distance and different depths.An ultrasonic array probe is adopted to acquire FMC datasets from several aluminum alloy specimens containing two SDHs by experiments.The optimal feature subset selected by FS-MIS is set as the input of the chosen ML model to train and predict the times of arrival(ToAs)of the scattered waves emitted by adjacent SDHs.The experimental results demonstrate that the relative errors of the predicted ToAs are all below 3.67%with an average error of 0.25%,signifcantly improving the time resolution of UT signals.On this basis,the predicted ToAs are assigned to the corresponding original signals for decoupling overlapped pulse-echoes and reconstructing high-resolution FMC datasets.The imaging resolution is enhanced to 0.5λby implementing the total focusing method(TFM).The relative errors of hole depths and central distance are no more than 0.51%and 3.57%,respectively.Finally,the superior performance of the proposed FS-MIS is validated by comparing it with initial feature space and conventional dimensionality reduction techniques.
基金supported by the Institute of Information and Communications Technology Planning and Evaluation(IITP)grant funded by the Korean government(MSIT)(No.2022-0-00089,Development of clustering and analysis technology to identify cyber attack groups based on life cycle)the Institute of Civil Military Technology Cooperation funded by the Defense Acquisition Program Administration and Ministry of Trade,Industry and Energy of Korean government under Grant No.21-CM-EC-07.
文摘Cybersecurity increasingly relies on machine learning(ML)models to respond to and detect attacks.However,the rapidly changing data environment makes model life-cycle management after deployment essential.Real-time detection of drift signals from various threats is fundamental for effectively managing deployed models.However,detecting drift in unsupervised environments can be challenging.This study introduces a novel approach leveraging Shapley additive explanations(SHAP),a widely recognized explainability technique in ML,to address drift detection in unsupervised settings.The proposed method incorporates a range of plots and statistical techniques to enhance drift detection reliability and introduces a drift suspicion metric that considers the explanatory aspects absent in the current approaches.To validate the effectiveness of the proposed approach in a real-world scenario,we applied it to an environment designed to detect domain generation algorithms(DGAs).The dataset was obtained from various types of DGAs provided by NetLab.Based on this dataset composition,we sought to validate the proposed SHAP-based approach through drift scenarios that occur when a previously deployed model detects new data types in an environment that detects real-world DGAs.The results revealed that more than 90%of the drift data exceeded the threshold,demonstrating the high reliability of the approach to detect drift in an unsupervised environment.The proposed method distinguishes itself fromexisting approaches by employing explainable artificial intelligence(XAI)-based detection,which is not limited by model or system environment constraints.In conclusion,this paper proposes a novel approach to detect drift in unsupervised ML settings for cybersecurity.The proposed method employs SHAP-based XAI and a drift suspicion metric to improve drift detection reliability.It is versatile and suitable for various realtime data analysis contexts beyond DGA detection environments.This study significantly contributes to theMLcommunity by addressing the critical issue of managing ML models in real-world cybersecurity settings.Our approach is distinguishable from existing techniques by employing XAI-based detection,which is not limited by model or system environment constraints.As a result,our method can be applied in critical domains that require adaptation to continuous changes,such as cybersecurity.Through extensive validation across diverse settings beyond DGA detection environments,the proposed method will emerge as a versatile drift detection technique suitable for a wide range of real-time data analysis contexts.It is also anticipated to emerge as a new approach to protect essential systems and infrastructures from attacks.
文摘目的建立预测重症慢性阻塞性肺疾病(简称慢阻肺)患者死亡风险的机器学习模型,探讨与慢阻肺患者死亡风险相关的因素,并加以解释,解决机器学习模型的“黑箱”问题。方法选取美国多中心急诊重症监护病(emergency intensive care unit,eICU)数据库中的8088例重症慢阻肺患者为研究对象,提取每次入住重症监护病房的前24 h内的数据并随机分组,70%用于模型训练,30%用于模型验证。采用LASSO回归进行预测变量选择,避免过拟合。采用5种机器学习模型对患者的住院病死率进行预测。通过曲线下面积(area under curve,AUC)比较5种模型和APACHEⅣa评分的预测性能,并采用SHAP(SHapley Additive exPlanations)方法解释随机森林(random forest,RF)模型的预测结果。结果RF模型在5种机器学习模型和APACHEⅣa评分系统中表现出最佳的性能,AUC达到0.830(95%置信区间0.806~0.855)。通过SHAP方法检测最重要的10种预测变量,其中无创收缩压的最小值被认为是最重要的预测变量。结论通过机器学习识别危险因素,并使用SHAP方法解释预测结果,可早期预测患者的死亡风险,有助于临床医生制定准确的治疗计划,合理分配医疗资源。
基金the National Natural Science Foundation of China(Nos.42377164,41972280 and 42272326)National Natural Science Outstanding Youth Foundation of China(No.52222905)+1 种基金Natural Science Foundation of Jiangxi Province,China(No.20232BAB204091)Natural Science Foundation of Jiangxi Province,China(No.20232BAB204077).
文摘Landslide inventory is an indispensable output variable of landslide susceptibility prediction(LSP)modelling.However,the influence of landslide inventory incompleteness on LSP and the transfer rules of LSP resulting error in the model have not been explored.Adopting Xunwu County,China,as an example,the existing landslide inventory is first obtained and assumed to contain all landslide inventory samples under ideal conditions,after which different landslide inventory sample missing conditions are simulated by random sampling.It includes the condition that the landslide inventory samples in the whole study area are missing randomly at the proportions of 10%,20%,30%,40%and 50%,as well as the condition that the landslide inventory samples in the south of Xunwu County are missing in aggregation.Then,five machine learning models,namely,Random Forest(RF),and Support Vector Machine(SVM),are used to perform LSP.Finally,the LSP results are evaluated to analyze the LSP uncertainties under various conditions.In addition,this study introduces various interpretability methods of machine learning model to explore the changes in the decision basis of the RF model under various conditions.Results show that(1)randomly missing landslide inventory samples at certain proportions(10%–50%)may affect the LSP results for local areas.(2)Aggregation of missing landslide inventory samples may cause significant biases in LSP,particularly in areas where samples are missing.(3)When 50%of landslide samples are missing(either randomly or aggregated),the changes in the decision basis of the RF model are mainly manifested in two aspects:first,the importance ranking of environmental factors slightly differs;second,in regard to LSP modelling in the same test grid unit,the weights of individual model factors may drastically vary.
基金supported by the National Natural Science Foundation of China(Nos.52161011,52373236)the Natural Science Foundation of Guangxi Province(2023GXNSFDA026046)+8 种基金Guangxi Science and Technology Project(Guike AB24010247)the Central Guiding Local Science and Technology Development Fund Projects(Guike ZY23055005)the Scientific Research and Technology Development Program of Guilin(20220110-3)the Scientific Research and Technology Development Program of Nanning Jiangnan district(20230715-02)the Guangxi Key Laboratory of Superhard Material(2022-K-001),the Guangxi Key Laboratory of Information Materials(231003-Z,231013-Z and 231033-K)the Engineering Research Center of Electronic Information Materials and Devices,the Ministry of Education(EIMD-AB202009),the Major Research Plan of the National Natural Science Foundation of China(92166112),the Innovation Project of GUET Graduate Education(2022YCXS200)the Projects of MOE Key Lab of Disaster Forecast and Control in Engineering in Jinan University(20200904006)the Guangdong Province International Science and Technology Cooperation Project(2023A0505050103)the Open Project Program of Wuhan National Laboratory for Optoelectronics(2021WNLOKF010)for the financial support given to this work.
文摘L1_(2)phase-strengthened Fe-Co-Ni-based high-entropy alloys(HEAs)have attracted considerable attention due to their excellent mechanical properties.Improving the properties of HEAs through conventional experimental methods is costly.Therefore,a new method is needed to predict the properties of alloys quickly and accurately.In this study,a comprehensive prediction model for L1_(2)phase-strengthened Fe-Co-Ni-based HEAs was developed.The existence of the L1_(2)phase in the HEAs was first predicted.A link was then established between the microstructure(L1_(2)phase volume fraction)and properties(hardness)of HEAs,and comprehensive prediction was performed.Finally,two mutually exclusive properties(strength and plasticity)of HEAs were coupled and co-optimized.The Shapley additive explained algorithm was also used to interpret the contribution of each model feature to the comprehensive properties of HEAs.The vast compositional and process search space of HEAs was progressively screened in three stages by applying different prediction models.Finally,four HEAs were screened from hundreds of thousands of possible candidate groups,and the prediction results were verified by experiments.In this work,L1_(2)phase-strengthened Fe-Co-Ni-based HEAs with high strength and plasticity were successfully designed.The new method presented herein has a great cost advantage over traditional experimental methods.It is also expected to be applied in the design of HEAs with various excellent properties or to explore the potential factors affecting the microstructure/properties of alloys.
基金supported by the National Key Research and Development Program of China(2018YFC1004300,2018YFC1004301,and 2018YFE0106900)the National Natural Science Foundation of China(82173471,82003409,82103823,and 82073503)+1 种基金the Natural Science Foundation of Guangdong Province(2021A1515012212,2021A1515011754,2021B1515020015,2020A1515011131,2019A050510017,2018B05052007,and 2017A090905042)the Guangxi Key Research and Development Plan(GUIKEAB18050024).
文摘Evidence from animal experiments has shown that chlorinated polyfluoroalkyl ether sulfonic acids(Cl-PFESAs)can induce vision dysfunction in zebrafish.However,environmental epidemiological evidence supporting this hypothesis remains limited.In our case−control study,samples collected from 270 individuals(135 controls and 135 cases)from the Isomers of C8 Health Project data were analyzed for Cl-PFESAs.We also repeated our analysis on zebrafish to support our findings in humans and to decipher the mechanism underlying Cl-PFESA eye toxicity.The serum levels of per-and polyfluoroalkyl substances(PFASs)and alternatives were significantly higher in the cases than in the controls.Higher serum Cl-PFESA levels were associated with greater odds of eye diseases,and the trend showed a statistically significant dose-dependent relationship.The Shapley additive explanations(SHAP)value indicated that 8:2 Cl-PFESA was the dominant eye disease risk factor among the 13 studied PFASs.In zebrafish experiments,Cl-PFESAs induced eye toxicity in adult zebrafish by oxidative damage and cell apoptosis.Compared to the control group,there was significantly reduced thicknesses of the inner plexiform layer(IPL),outer plexiform layer(OPL),and retinal tissue in the zebrafish exposed to Cl-PFESAs.Our study provides human clinical and animal experimental data,showing that exposure to PFASs increases the odds of the development of eye toxicity.
基金supported by the National Natural Science Foundation of China(62173212).
文摘Cervical spondylotic myelopathy(CSM)is the main cause of adult spinal cord dysfunction,mostly appearing in middle-aged and elderly patients.Currently,the diagnosis of this condition depends mainly on the available imaging tools such as X-ray,computed tomography and magnetic resonance imaging(MRI),of which MRI is the gold standard for clinical diagnosis.However,MRI data cannot clearly demonstrate the dynamic characteristics of CSM,and the overall process is far from costefficient.Therefore,this study proposes a new method using multiple gait parameters and shallow classifiers to dynamically detect the occurrence of CSM.In the present study,45 patients with CSM and 45 age-matched asymptomatic healthy controls(HCs)were recruited,and a three-dimensional(3D)motion capture system was utilized to capture the locomotion data.Furthermore,63 spatiotemporal,kinematic,and nonlinear parameters were extracted,including lower limb joint angles in the sagittal,coronal,and transverse planes.Then,the Shapley Additive exPlanations(SHAP)value was utilized for feature selection and reduction of the dimensionality of features,and five traditional shallow classifiers,including support vector machine(SVM),logistic regression(LR),k-nearest neighbor(KNN),decision tree(DT),and random forest(RF),were used to classify gait patterns between CSM patients and HCs.On the basis of the 10-fold cross-validation method,the highest average accuracy was achieved by SVM(95.56%).Our results demonstrated that the proposed method could effectively detect CSM and thus serve as an automated auxiliary tool for the clinical diagnosis of CSM.
基金supported in part by the National Key Research Program of China (2016YFB0900100)Key Project of Shanghai Science and Technology Committee (18DZ1100303).
文摘To extract strong correlations between different energy loads and improve the interpretability and accuracy for load forecasting of a regional integrated energy system(RIES),an explainable framework for load forecasting of an RIES is proposed.This includes the load forecasting model of RIES and its interpretation.A coupled feature extracting strat-egy is adopted to construct coupled features between loads as the input variables of the model.It is designed based on multi-task learning(MTL)with a long short-term memory(LSTM)model as the sharing layer.Based on SHapley Additive exPlanations(SHAP),this explainable framework combines global and local interpretations to improve the interpretability of load forecasting of the RIES.In addition,an input variable selection strategy based on the global SHAP value is proposed to select input feature variables of the model.A case study is given to verify the effectiveness of the proposed model,constructed coupled features,and input variable selection strategy.The results show that the explainable framework intuitively improves the interpretability of the prediction model.