期刊文献+

随机森林和决策树模型在轻型缺血性脑卒中患者复发预测中的应用分析

Recurrence prediction of patients with minor ischemic stroke based on random forest and decision tree
原文传递
导出
摘要 目的构建轻型缺血性脑卒中(MIS)患者2年内复发的随机森林和决策树预测模型,并分析模型的临床应用价值。方法回顾性收集2020年7月1日至12月31日于山西省心血管病医院神经内科就诊的520例MIS患者的病历资料,根据2年内是否复发将患者分为复发组和未复发组。基于缺失森林对数据进行填补,根据文献检索与专家讨论结果筛选预测变量并进行单因素分析,合成少数过采样技术-标称连续(SMOTE-NC)技术处理数据不平衡,采用贝叶斯优化十折交叉验证构建随机森林、决策树模型并与Logistic回归模型进行比较。基于受试者工作特征曲线下面积(AUC)、布里尔分数(BS)与校准曲线分别评价模型的区分度与校准度。对预测性能最好的模型采用SHAP模型解释预测结果。结果2年内复发患者共93例(17.9%)。两组患者的年龄,吸烟、糖尿病、循环梗死部位、多发性脑梗死比例,以及舒张压、红细胞压积、血小板计数、低密度脂蛋白水平比较,差异有统计学意义(P<0.05)。Logistic回归模型、决策树模型与随机森林模型在测试集中,预测MIS患者2年内复发情况的AUC(95%CI)分别为0.764(0.691~0.835)、0.743(0.668~0.818)、0.892(0.843~0.941),BS分别为0.200、0.211、0.142,随机森林预测效果最好,准确度为0.822,灵敏度为0.818,阳性预测值为0.808,阴性预测值为0.835。SHAP分析结果显示,随机森林模型中重要性排序前5名的变量分别是年龄、低密度脂蛋白、吸烟、糖尿病、舒张压。结论与决策树和Logistic回归模型相比,随机森林模型预测MIS 2年内复发的性能较好。 Objective To construct the random forest and decision tree prediction model for recurrence of minor ischemic stroke(MIS)within two years,and analyze the predicted performance of the models.Methods The medical records of 520 MIS patients who visited Department of Neurology of Shanxi Cardiovascular Hospital from July 1 to December 31,2020 were retrospectively collected.Patients were divided into a recurrent group and a non-recurrent group based on whether they relapsed within two years.This study filled in the data through the missing forest.Based on literature search and expert discussion,predictive variables were selected and univariate analysis was conducted,and addressed data imbalance through the synthetic minority over-sampling technique-nominal continuity(SMOTE-NC).Random forest and decision tree models were constructed using Bayesian optimization 10-fold cross validation and compared with the Logistic regression model.The discrimination and calibration of the models were evaluated based on the area under the receiver operating characteristic curve(AUC),Brier score(BS),and calibration curve.The prediction results of the model with the excellent predictive performance were explained using the SHapley Additive exPlanations(SHAP)model.Results A total of 93 patients(17.9%)experienced recurrence within two years.There were statistical differences between the two groups in age,smoking,diabetes,location of circulatory infarction,multiple cerebral infarction,diastolic pressure,hematocrit,platelet count,and low-density lipoprotein(P<0.05).The AUC(95%CI)of the testing set of Logistic regression model,decision tree model,and random forest model for predicting recurrence within two years in patients with MIS were 0.764(0.691,0.835),0.743(0.668,0.818),0.892(0.843,0.941),and BS were 0.200,0.211,and 0.142,respectively.The random forest model had the excellent prediction performance,with an accuracy of 0.822,a sensitivity of 0.818,a positive prediction value of 0.808,and a negative prediction value of 0.835.SHAP analysis showed that the top five variables in the random forest model were age,low-density lipoprotein,smoking,diabetes,and diastolic pressure.Conclusions Compared with decision tree model and Logistic regression model,the random forest model performs better in predicting the recurrence of MIS within two years.
作者 莫秋红 丁晓波 张岩波 李伟荣 models Mo Qiuhong;Ding Xiaobo;Zhang Yanbo;Li Weirong(School of Public Health,Shanxi Medical University,Taiyuan 030000,China;Department of Neurology,Shanxi Cardiovascular Hospital&Affiliated Cardiovascular Hospital of Shanxi Medical University,Taiyuan 030000,China)
出处 《神经疾病与精神卫生》 2024年第2期77-82,共6页 Journal of Neuroscience and Mental Health
基金 山西省重点研发计划项目(2021XM14)。
关键词 卒中 轻型缺血性卒中 复发 随机森林 LOGISTIC回归 决策树 Stroke Miner ischemic stroke Recurrence Random forest Logistic regression Decision trees
  • 相关文献

参考文献6

二级参考文献68

共引文献10966

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部