基于机器学习算法的心力衰竭10年患病风险可解释预测建模分析

Machine learning-based predictive modeling for 10-year risk of heart failure with interpretability

下载PDF

导出

摘要目的建立基于机器学习算法的心力衰竭10年患病风险预测模型,并通过SHAP方法提升模型的可解释性,以提高心力衰竭风险评估的准确性和临床应用价值。方法采用英国生物银行(UK Biobank,UKB)数据库数据,涵盖了40~70岁之间的502349名英国成年人,基于2006~2010年间的基线数据。选取487572例未发生心力衰竭和10374例发生心力衰竭的病例,随访时间为10年,以ICD-10编码定义心力衰竭事件。使用LightGBM、XGBoost、CatBoost三种机器学习算法构建预测模型,在Python和RStudio环境中完成数据预处理、特征选择及模型效能评估,利用SHAP方法可视化解释模型预测结果。结果经过随机欠采样平衡样本后,本研究建立的模型有效预测了10年内心力衰竭的发病情况。LightGBM模型展现出最佳的预测性能,其次是CatBoost和XGBoost。SHAP值分析揭示年龄、胱抑素C、接受治疗或服用药物次数、曾诊断患有心血管疾病、心血管疾病相关多基因风险分数是心力衰竭风险预测的重要影响因素。结论本研究证实了机器学习模型在心力衰竭风险预测中的有效性,特别是LightGBM模型在所有比较的模型中表现最佳。SHAP值的分析为理解模型预测的驱动因素提供了新的视角,有助于临床决策支持和风险管理。 Objective To develop a machine learning-based predictive model for the 10-year risk of heart failure and analyze the model’s interpretability using the SHAP method,thereby enhancing the accuracy and clinical utility of heart failure risk assessments.Methods The data from the UK Biobank,encompassing 502,349 UK adults aged 40-70 years were used,based on baseline data from 2006-2010.It included 487,572 cases without heart failure and 10,374 cases with heart failure over a 10-year follow-up,defining heart failure events via ICD-10 codes.The prediction models were built using LightGBM,XGBoost and CatBoost machine learning algorithms.The data preprocessing,feature selection and model performance evaluation were conducted in Python and RStudio environments,with the SHAP method used for the visual interpretation of the model’s predictive outcomes.Results After balancing the samples through random under sampling,the developed models were capable of effectively predicting the 10-year risk of heart failure.The LightGBM model demonstrated superior predictive performance,followed by CatBoost and XGBoost.The SHAP value analysis revealed that the age,cystatin C,the number of treatments or medications taken,previous diagnoses of vascular or heart issues,and polygenic risk scores were significant predictors of heart failure risk.Conclusion The efficacy of machine learning models in predicting the risk of heart failure is confirmed fine,with the LightGBM model outperforming all the compared models.The analysis of SHAP values offers a new perspective on understanding the drivers behind model predictions,aiding clinical decision-making and risk management.

作者蔡佳音陈海涛王增武 CAI Jia-yin;CHEN Hai-tao;WANG Zeng-wu(Division of Prevention and Community Health,National Center for Cardiovascular Diseases,Fuwai Hospital,Chinese Academy of Medical Sciences&Peking Union Medical College&Chinese Academy of Medical Sciences,Beijing 102308,China;School of Public Health,Shenzhen,Sun Yat-Sen University,518107 Shenzhen,China)

机构地区中国医学科学院北京协和医学院中山大学公共卫生学院系(深圳)

出处《中国心血管病研究》 CAS 2024年第4期323-330,共8页 Chinese Journal of Cardiovascular Research

基金国家卫生健康委委托项目(NHC 2020-609)。

关键词心力衰竭风险预测机器学习 LightGBM SHAP值 Heart failure Risk prediction Machine learning Light GBM SHAP values

分类号 R541.6 [医药卫生—心血管疾病]

引文网络
相关文献

1王欣宇,杨涛,胡孔法.基于大语言预训练模型的中医个性化处方推荐研究[J].中华中医药学刊,2024,42(4):15-18.
2王琨,李明,张晓波,夏艺萍,赵沛伟,段应忠.人工智能在心血管健康管理中的应用现状[J].医疗卫生装备,2024,45(2):92-96.
3李林昊,王澳,孙树国,吕欢,徐铭,王振.基于平衡损失和多级注意力的溯因推理方法[J].闽南师范大学学报（自然科学版）,2024,37(1):27-39.
4李金星,古治容,吴柳,钟淋莉,黄敏.地中海饮食对炎症性肠病的影响--孟德尔随机化研究[J].全科护理,2024,22(7):1353-1357.
5吕致,高登峰.肾功能与左心室结构及功能的相关性研究[J].延安大学学报（医学科学版）,2024,22(1):44-49.
6纪展鹏,魏兵.儿童哮喘与血清25羟维生素D的双向两样本孟德尔随机化研究[J].中国中西医结合儿科学,2024,16(2):148-154.
7EBSCO推出临床决策Dyna创新中心[J].数据分析与知识发现,2024,8(2):55-55.
8王韬,赖晓珑,卫亚东,郭鸿,金浩.利用晶体图神经网络和对抗样本方法探索CsSnBr_(3-x)I_(x)中高能量转换效率构型并提高模型的可解释性[J].Science China Materials,2024,67(4):1183-1191. 被引量：1
9刘子硕,刘恒硕,于博洋,陈观洲,吕沛颖,孙亮,任俊红.2型糖尿病患者颈总动脉内中膜厚度指标与握力和步速的相关性[J].中华老年医学杂志,2024,43(4):450-455.
10陈鸿祥,蔡佳洁,魏君,张红梅,向毅,黄子桐,徐浩,肖雄,赵星.中老年群体酒精戒断与生物衰老加速的关系:基于英国生物银行数据库的研究[J].四川大学学报（医学版）,2024,55(2):353-359.

中国心血管病研究

2024年第4期

浏览历史

内容加载中请稍等...

基于机器学习算法的心力衰竭10年患病风险可解释预测建模分析

相关作者

相关机构

相关主题

浏览历史