期刊文献+

可解释的机器学习模型预测缺血性脑卒中患者预后研究

Interpretable machine learning-based models in predicting prognoses in stroke patients
原文传递
导出
摘要 目的探讨可解释的机器学习模型预测急性缺血性脑卒中预后的应用价值。方法选取广东医科大学附属湛江中心医院神经内科自2020年3月至2023年10月实施静脉溶栓治疗的296例急性缺血性脑卒中患者为研究对象,随访3个月后使用改良Rankin量表评估预后(0~2分定义为预后良好,3~6分定义为预后不良)。回顾性收集患者的临床资料,并采用多因素Logistic回归分析筛选出患者预后的独立影响因素。以3∶2比例将患者随机分为训练集(n=178)和测试集(n=118),以预后独立影响因素为特征变量训练10种机器学习模型(逻辑回归、随机森林、支持向量机、朴素贝叶斯、线性判别分析、混合判别分析、灵活判别分析、梯度增强机、极端梯度提升和分类梯度提升),分别使用校准曲线、精确-召回曲线、精确-召回增益曲线及受试者工作特征曲线评估这10种机器学习模型的预测性能,使用Shapley加法解释(SHAP)对机器学习模型附加解释和可视化(包含全局解释和局部解释)。结果296例患者中预后不良72例。年龄(OR=1.039,95%CI:1.008~1.072,P=0.015)、美国国立卫生研究院卒中量表评分(OR=1.213,95%CI:1.000~1.337,P<0.001)、格拉斯哥昏迷量表评分(OR=0.470,95%CI:0.289~0.765,P=0.002,)、卒中预测工具-Ⅱ评分(OR=1.257,95%CI:1.043~1.516,P=0.016)、C反应蛋白水平(OR=1.709,95%CI:1.398~2.087,P<0.001)和血小板计数(OR=0.988,95%CI:0.978~0.998,P=0.016)是患者预后的独立影响因素。在10种机器学习模型中,极端梯度提升模型预测患者预后的性能最高(校准曲线评估示一致性指数为0.896,精确-召回曲线评估示曲线下面积为0.791,精确-召回增益曲线示曲线下面积为0.363,受试者工作特征曲线示曲线下面积为0.856)。全局解释中SHAP直观图显示特征变量的重要性排序依次为C反应蛋白、美国国立卫生研究院卒中量表评分、血小板计数、格拉斯哥昏迷量表评分、卒中预测工具-Ⅱ评分和年龄;SHAP散点图可视化了6个特征变量的贡献方向,呈"两端分布"现象;SHAP依赖图显示了6个特征变量的观测值与SHAP值间的依赖关系,其中C反应蛋白趋势最为显著。SHAP力图为单个样本提供了局部解释,使得极端梯度提升模型更加透明和可解释性。结论基于年龄、美国国立卫生研究院卒中量表评分、格拉斯哥昏迷量表评分、卒中预测工具-Ⅱ评分、C反应蛋白水平和血小板计数为特征变量的极端梯度提升模型预测急性缺血性脑卒中患者预后的性能最优,在此基础上结合SHAP进行模型解释和可视化,有助于理解各特征变量对预测结果的贡献大小及方向。 ObjectiveTo explore the value of interpretable machine learning model in predicting the prognoses of patients with acute ischemic stroke..MethodsA total of 296 patients with acute ischemic stroke who received intravenous thrombolysis in Zhanjiang Central Hospital,Guangdong Medical University from March 2020 to October 2023 were selected.Prognosis was assessed 3 months after follow-up using modified Rankin scale(scores of 0-2:good prognosis;scores of 3-6:poor prognosis).Clinical data were collected and analyzed retrospectively,and independent influencing factors for prognoses were analyzed by multivariate Logistic regression.These patients were randomly divided into training dataset(n=178)and test dataset(n=118)in a 3:2 ratio;independent influencing factors were used as characteristic variables to train these 10 machine learning models,including Logistic regression,random forest,support vector machine,naive Bayesian model,linear discriminant analysis,mixture discriminant analysis,flexible discriminant analysis,gradient boosting machine,extreme gradient boosting,and category boosting.Prediction performance of these 10 machine learning models were evaluated using calibration curve,precise-recall curve,precision-recall gain curve and receiver operating characteristic(ROC)curve.Interpretation and visualization were added via Shapley Additive exPlanation(SHAP)to the machine learning models(including global interpretation and local interpretation).ResultsOf the 296 patients,72 had a poor prognosis.Age(OR=1.039,95%CI:1.008-1.072,P=0.015),National Institute of Health Stroke Scale score(OR=1.213,95%CI:1.000-1.337,P<0.001),Glasgow Coma Scale score(OR=0.470,95%CI:0.289-0.765,P=0.002),Stroke Prognostic Instrument-Ⅱscore(OR=1.257,95%CI:1.043-1.516,P=0.016,),C-reactive protein(OR=1.709,95%CI:1.398-2.087,P<0.001)and platelet count(OR=0.988,95%CI:0.978-0.998,P=0.016)were independent influencing factors for prognoses.Among the 10 machine learning algorithms,calibration curve(C-inder:0.896),precise-recall curve(area under the curve[AUC]:0.791),precision-recall gain curve(AUC:0.363),and ROC curve(AUC:0.856)in both the training and test sets confirmed that the XGBoost model has the highest performance in predicting prognoses.SHAP visualisation diagram indicated that order of importance was C-reactive protein,National Institutes of Health Stroke Scale,platelet count,Glasgow Coma Scale,Stroke Prediction Tool-II,and age.SHAP scatter plot visualized the contribution direction of these 6 characteristic variables,with bimodal distribution.SHAP dependence plot indicated dependence between values of 6 characteristic variables and SHAP values,with C-reactive protein enjoying the most significant trend.SHAP plot provided local interpretation for individual sample,making the extreme gradient enhancement model more transparent and interpretable.ConclusionXGBoost model incorporating age,National Institute of Health Stroke Scale,Glasgow Coma Scale,Stroke Prognostic Instrument-Ⅱ,C-reactive protein,and platelet count can differentiate poor prognosis from good prognosis in patients with acute ischemic stroke with high accuracy;on this basis,the model interpretation and visualization combined with SHAP are helpful to understand the contribution and direction of each characteristic variable to the prediction results.
作者 李新鸿 麦晖 符铁译 陈建雅 Li Xinhong;Mai Hui;Fu Tieyi;Chen Jianya(Zhanjiang Central Hospital,Guangdong Medical University,Zhanjiang 524000,China)
出处 《中华神经医学杂志》 CAS CSCD 北大核心 2024年第8期817-827,共11页 Chinese Journal of Neuromedicine
基金 湛江市科技计划项目(2020B01112)。
关键词 急性缺血性脑卒中 预后 机器学习模型 极端梯度提升模型 Shapley加法解释 Acute ischemic stroke Prognosis Machine learning model Extreme gradient boosting model Shapley Additive exPlanation
  • 相关文献

参考文献2

  • 1中国老年医学学会急诊医学分会,中华医学会急诊医学分会卒中学组,中国卒中学会急救医学分会,柴艳芬,陈玉国,陈晓辉,陈旭岩,陈凤英,陈志,陈力,丁宁,邓颖,邓曼,范西真,高恒波,高伟波,顾伟,顾彬,郭树彬,郭东风,韩永生,郝剑,何小军,何志红,何建,贺曦,胡北,洪玉才,侯宇飞,姬新才,冀兵,江旺祥,江稳强,姜素文,蒋旭九,金红旭,康海,康健,兰超,黎檀实,李莉,李建国,李杰,李桂云,李培武,李志刚,李学斌,李小刚,李尚伦,李其富,李力卓,李凤杰,果枫,廖晓凌,刘红梅,刘明华,刘明森,刘玉法,刘纪宁,刘爱华,刘世伟,陆峰,吕传柱,卢中秋,陆远强,马剡芳,马岳峰,马中富,孟庆义,孟广军,潘曙明,秦厉杰,秦宇红,宋海晶,孙勇,单志刚,单毅,商德亚,沈正善,盛继军,谭秀岭,唐艳,唐新宇,王仲,王宇新,王江,王聪,王旭东,王伯良,王玉红,王少平,魏捷,伍国锋,吴国平,吴晓飞,吴彩军,武铜,熊辉,肖力屏,许铁,徐峰,闫柏刚,燕重远,杨蓉佳,杨建中,阳世雄,姚丹林,尹文,尹永杰,余涛,叶丹,喻安永,袁光雄,曾红科,宗建平,纵雪梅,赵斌,赵敏,赵宇宏,詹红,张文中,张国强,张茂,张劲松,张文武,张泓,张进军,张玉,张均,张云霞,张重阳,张良,张蜀,张海燕,张晓几,郑亚安,祝振忠,朱勤忠,朱良付.急性缺血性脑卒中急诊急救中国专家共识(2018)[J].中国急救医学,2018,38(7):557-564. 被引量:310
  • 2郑雯丽,邓仁丽,邱业银,杨柳,侯莉梅,梁恒.缺血性脑卒中复发风险预测模型的研究进展[J].中华神经医学杂志,2020,19(4):408-412. 被引量:15

二级参考文献29

共引文献323

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部