摘要
目的基于机器学习构建甘肃省平原风沙与黄土丘陵地区糖尿病性视网膜病变(DR)的临床预测模型并分析其影响因素。方法为横断面研究。基于中国糖尿病并发症(CNDCS)研究的甘肃省流行病学数据进行模型的建立与验证。采用多阶段分层随机抽样的方法纳入2型糖尿病(T2DM)患者并按照7∶3的比例生成训练集和测试集。收集平原风沙与黄土丘陵地区T2DM患者并发DR的情况。采用递归特征消除(RFE)的方法筛选出两地区的最优变量,选用逻辑回归(LR)、决策树(DT)、支持向量机(SVM)、随机森林(RF)和极端梯度提升树(XGBoost)5种机器算法进行模型的训练,采用曲线下面积(AUC)对5种机器算法进行比较,并筛选出最优模型,进一步采用沙普利可加性特征解释(SHAP)分析法直观地解释最优机器学习模型的结果。结果共纳入1739例T2DM患者。其中有23.63%(411/1739)的患者并发DR。RFE法结果显示,平原风沙与黄土丘陵地区最终分别筛选出8和14个最优变量。通过综合评价,平原风沙与黄土丘陵地区的最佳临床预测模型分别为RF(训练集AUC=0.874,测试集AUC=0.737)和XGBoost(训练集AUC=0.899,测试集AUC=0.783)。进一步进行的SHAP分析法结果显示,RF模型中排在前5位的重要区分特征为糖化血红蛋白(HbA1c)、糖尿病病程、心率、尿微量白蛋白、收缩压,XGBoost模型中排在前5位的重要区分特征为糖尿病病程、尿微量白蛋白、血清白蛋白、尿素氮、HbA1c。结论RF与XGBoost模型对DR风险指标的评估具有较高的可靠性。糖尿病病程、HbA1c、尿微量白蛋白是DR的影响因素。
Objective To construct a clinical prediction model for diabetic retinopathy(DR)based on machine learning and to analyze the factors influencing DR in the windy desert area and loess hilly area of Gansu province.Methods This study was a cross-sectional study.Modeling and validation based on epidemiologic data from Gansu province of the China national diabetic complications study(CNDCS).The included type 2 diabetes mellitus(T2DM)patients were cut into training and test sets at a ratio of 7∶3 using multistage stratified random sampling.Collection of concomitant DR in patients with T2DM in windy desert area and loess hilly area.The recursive feature elimination(RFE)method was used to screen the optimal variables for the two regions.Five machine algorithms,logistic regression(LR),decision tree(DT),support vector machines(SVM),random forest(RF)and eXtreme gradient boosting(XGBoost)were selected to train the model.Five machine algorithms were compared using the area under the curve(AUC)of the subjects and the optimal model was selected.And the Shapley additive explanation(SHAP)analysis was further used to visually interpret the results of the optimal machine learning model.Results A total of 1739 patients with T2DM were enrolled.Of these,23.63%(411/1739)had concurrent DR.The results of the RFE method showed that 8 and 14 optimal variables were finally screened for the windy desert area and the loess hilly area,respectively.According to the comprehensive evaluations,the RF model was identified as the best prediction model(AUC of the train set=0.874,AUC of the validation set=0.737)in windy desert area.The XGBoost model was identified as the best prediction model(AUC of train set=0.899,AUC of validation set=0.783)in loess hilly area.The results of further SHAP analysis method showed that the top five important distinguishing features of RF model were glycated hemoglobin A1c(HbA1c),duration of diabetes,heart rate,urine microalbumin and systolic blood pressure.The top five important distinguishing features of XGBoost model were duration of diabetes,urine microalbumin,blood albumin,blood urea nitrogen and HbA1c.Conclusions RF and XGBoost model had high reliability in assessing risk indicators of DR.Duration of diabetes,HbA1c,and urine microalbumin are influential factors in DR.
作者
洪豆豆
杨建宁
乔文俊
王云芳
张琦
刘静
Hong Doudou;Yang Jianning;Qiao Wenjun;Wang Yunfang;Zhang Qi;Liu Jing(The First Clinical Medical College of Gansu University of Chinese Medicine,Lanzhou 730000,China;The First Clinical Medical College of Ningxia Medical University,Yinchuan 750000,China;Metabolic Disease Diagnosis and Treatment Center,Gansu Provincial People΄s Hospital,Lanzhou 730000,China;Department of Geriatric Medicine,Gansu Provincial People′s Hospital,Lanzhou 730000,China)
出处
《中华糖尿病杂志》
CAS
CSCD
北大核心
2024年第3期297-306,共10页
CHINESE JOURNAL OF DIABETES MELLITUS
基金
国家自然科学基金(81960173,82160166)
甘肃省重点研发计划(22YF7FA096)
甘肃省人民医院院内科研基金(22GSSYA-1)
兰州市人才创新创业项目(2021-RC-136)
甜蜜医生培育项目(2021SD01)。
关键词
糖尿病
视网膜病变
机器学习
危险因素
Diabetes mellitus
Retinopathy
Machine learning
Risk factor