Objective:To establish a stroke prediction and feature analysis model integrating XGBoost and SHAP to aid the clinical diagnosis and prevention of stroke.Methods:Based on the open data set on Kaggle,with the help of d...Objective:To establish a stroke prediction and feature analysis model integrating XGBoost and SHAP to aid the clinical diagnosis and prevention of stroke.Methods:Based on the open data set on Kaggle,with the help of data preprocessing and grid parameter optimization,an interpretable stroke risk prediction model was established by integrating XGBoost and SHAP and an explanatory analysis of risk factors was performed.Results:The XGBoost model’s accuracy,sensitivity,specificity,and area under the receiver operating characteristic(ROC)curve(AUC)were 96.71%,93.83%,99.59%,and 99.19%,respectively.Our explanatory analysis showed that age,type of residence,and history of hypertension were key factors affecting the incidence of stroke.Conclusion:Based on the data set,our analysis showed that the established model can be used to identify stroke,and our explanatory analysis based on SHAP increases the transparency of the model and facilitates medical practitioners to analyze the reliability of the model.展开更多
基金the National Natural Science Foundation Project(Grant No.61863027)the Special Research Project on High Quality Development of Innovation and Entrepreneurship Education of the Chinese Society of Higher Education(Grant No.21CXD01)the Key R&D Plan of Jiangxi Province(Grant No.20202BBGL73057).
文摘Objective:To establish a stroke prediction and feature analysis model integrating XGBoost and SHAP to aid the clinical diagnosis and prevention of stroke.Methods:Based on the open data set on Kaggle,with the help of data preprocessing and grid parameter optimization,an interpretable stroke risk prediction model was established by integrating XGBoost and SHAP and an explanatory analysis of risk factors was performed.Results:The XGBoost model’s accuracy,sensitivity,specificity,and area under the receiver operating characteristic(ROC)curve(AUC)were 96.71%,93.83%,99.59%,and 99.19%,respectively.Our explanatory analysis showed that age,type of residence,and history of hypertension were key factors affecting the incidence of stroke.Conclusion:Based on the data set,our analysis showed that the established model can be used to identify stroke,and our explanatory analysis based on SHAP increases the transparency of the model and facilitates medical practitioners to analyze the reliability of the model.