摘要
目的基于不同监督机器学习算法,构建并验证适用于脓毒性休克患者28 d死亡风险的最佳预测模型。方法从美国重症监护医学信息数据库Ⅳv2.0(MIMIC-Ⅳv2.0)中筛选出符合脓毒症3.0的脓毒性休克患者,随机抽取病例,其中70%作为训练集,30%作为验证集。从人口学特征及基础生命体征、入重症监护病房(ICU)24 h内血清学指标及可能影响指标的合并症、功能评分及高级生命支持3个层面提取相关预测变量。比较基于决策树分类回归树(CART)、随机森林(RF)、支持向量机(SVM)、线性回归(LR)及超级学习器〔SL,综合了CART、RF和极端梯度提升(XGBoost)〕5种主流机器学习算法构建的模型对脓毒性休克患者28 d死亡的预测效能,筛选最佳算法模型。利用LASSO回归、RF和XGBoost算法,通过取交集确定最佳预测变量,构建预测模型。采用受试者工作特征曲线(ROC曲线)验证模型的预测效能;采用校准曲线评估模型的准确性;采用决策曲线分析(DCA)验证模型的实用性。结果最终共纳入3295例脓毒性休克患者,28 d存活2164例,死亡1131例,病死率为34.32%;其中,训练集2307例(28 d死亡792例,病死率为34.33%),验证集988例(28 d死亡339例,病死率为34.31%)。基于训练集数据分别建立5种机器学习模型;在纳入3个层面的变量后,RF、SVM、LR 3种机器学习模型在验证集预测脓毒性休克患者28 d死亡的ROC曲线下面积(AUC)依次为0.823〔95%可信区间(95%CI)为0.795~0.849〕、0.823(95%CI为0.796~0.849)、0.810(95%CI为0.782~0.838),高于CART算法模型(AUC=0.750,95%CI为0.717~0.782)和SL算法模型(AUC=0.756,95%CI为0.724~0.789),故将以上3种算法模型确定为最佳算法模型。综合3个层面变量后,通过LASSO回归、RF和XGBoost算法筛选并取交集,得出16个最佳预测变量,依次为入ICU 24 h内pH最大值、白蛋白(Alb)最大值、体温最大值、血乳酸(Lac)最小值、Lac最大值、血肌酐(SCr)最大值、Ca^(2+)最大值、血红蛋白(Hb)最小值、白细胞计数(WBC)最小值、年龄、简化急性生理学评分Ⅲ(SAPSⅢ)、WBC最大值、急性生理学评分Ⅲ(APSⅢ)、Na^(+)最小值、体质量指数(BMI)及活化部分凝血活酶时间(APTT)最小值。ROC曲线分析显示,以上述16个最佳预测变量构建的Logistic回归模型为最佳预测模型,在验证集中的AUC为0.806(95%CI为0.778~0.835);校准曲线及DCA曲线显示,该模型的精准度较高,且净收益最高可达0.3,其预测效能明显优于传统以单一功能评分〔APSⅢ评分、SAPSⅢ评分、序贯器官衰竭评分(SOFA)〕建立的模型〔AUC(95%CI)分别为0.746(0.715~0.778)、0.765(0.734~0.796)、0.625(0.589~0.661)〕。结论以pH值、Alb、体温、Lac、SCr、Ca^(2+)、Hb、WBC、SAPSⅢ评分、APSⅢ评分、Na^(+)、BMI、APTT等16个最佳变量构建的Logistic回归模型为脓毒性休克患者28 d死亡风险的最佳预测模型,其效能稳定,区分度及精准度均较高。
Objective To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms.Methods The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-Ⅳv2.0(MIMIC-Ⅳv2.0).According to the principle of random allocation,70%of these patients were used as the training set,and 30%as the validation set.Relevant predictive variables were extracted from three aspects:demographic characteristics and basic vital signs,serum indicators within 24 hours of intensive care unit(ICU)admission and complications possibly affecting indicators,functional scoring and advanced life support.The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree(CART),random forest(RF),support vector machine(SVM),linear regression(LR),and super learner[SL;combined CART,RF and extreme gradient boosting(XGBoost)]for 28-day death in patients with septic shock was compared,and the best algorithm model was selected.The optimal predictive variables were determined by intersecting the results from LASSO regression,RF,and XGBoost algorithms,and a predictive model was constructed.The predictive efficacy of the model was validated by drawing receiver operator characteristic curve(ROC curve),the accuracy of the model was assessed using calibration curves,and the practicality of the model was verified through decision curve analysis(DCA).Results A total of 3295 patients with septic shock were included,with 2164 surviving and 1131 dying within 28 days,resulting in a mortality of 34.32%.Of these,2307 were in the training set(with 792 deaths within 28 days,a mortality of 34.33%),and 988 in the validation set(with 339 deaths within 28 days,a mortality of 34.31%).Five machine learning models were established based on the training set data.After including variables at three aspects,the area under the ROC curve(AUC)of RF,SVM,and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823[95%confidence interval(95%CI)was 0.795-0.849],0.823(95%CI was 0.796-0.849),and 0.810(95%CI was 0.782-0.838),respectively,which were higher than that of the CART algorithm model(AUC=0.750,95%CI was 0.717-0.782)and SL algorithm model(AUC=0.756,95%CI was 0.724-0.789).Thus above three algorithm models were determined to be the best algorithm models.After integrating variables from three aspects,16 optimal predictive variables were identified through intersection by LASSO regression,RF,and XGBoost algorithms,including the highest pH value,the highest albumin(Alb),the highest body temperature,the lowest lactic acid(Lac),the highest Lac,the highest serum creatinine(SCr),the highest Ca^(2+),the lowest hemoglobin(Hb),the lowest white blood cell count(WBC),age,simplified acute physiology scoreⅢ(SAPSⅢ),the highest WBC,acute physiology scoreⅢ(APSⅢ),the lowest Na^(+),body mass index(BMI),and the shortest activated partial thromboplastin time(APTT)within 24 hours of ICU admission.ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model,with an AUC of 0.806(95%CI was 0.778-0.835)in the validation set.The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3,which was significantly outperforming traditional models based on single functional score[APSⅢscore,SAPSⅢscore,and sequential organ failure assessment(SOFA)score]with AUC(95%CI)of 0.746(0.715-0.778),0.765(0.734-0.796),and 0.625(0.589-0.661),respectively.Conclusions The Logistic regression model,constructed using 16 optimal predictive variables including pH value,Alb,body temperature,Lac,SCr,Ca^(2+),Hb,WBC,SAPSⅢscore,APSⅢscore,Na^(+),BMI,and APTT,is identified as the best predictive model for the 28-day death risk in patients with septic shock.Its performance is stable,with high discriminative ability and accuracy.
作者
谢政
金晶
刘东松
陆圣译
俞慧
韩冬
孙炜
黄铭
Xie Zheng;Jin Jing;Liu Dongsong;Lu Shengyi;Yu Hui;Han Dong;Sun Wei;Huang Ming(Department of Emergency,Affiliated Hospital of Jiangnan University,Wuxi 214000,Jiangsu,China;Department of Neurology,Affiliated Hospital of Jiangnan University,Wuxi 214000,Jiangsu,China)
出处
《中华危重病急救医学》
CAS
CSCD
北大核心
2024年第4期345-352,共8页
Chinese Critical Care Medicine
基金
江苏省无锡市卫生健康委科研项目(M202109)。
关键词
监督机器学习
脓毒性休克
预测模型
Supervised machine learning
Septic shock
Predictive model