期刊文献+

随机森林算法对体检人群糖尿病患病风险的预测价值研究 被引量:33

Predictive Value of Random Forest Algorithms for Diabetic Risk in People Underwent Physical Examination
下载PDF
导出
摘要 背景2017年我国是全世界糖尿病患者人数最多的国家,糖尿病患者人数达到了1.14亿,及早识别糖尿病高危人群并对其进行有效干预,能够降低糖尿病的患病风险。目的探讨随机森林算法在体检人群糖尿病患病风险预测中的应用价值。方法 2016年9月—2017年3月,利用乌鲁木齐市石油新村街道和卡子湾街道社区卫生服务中心35~74岁全民健康体检的数据进行研究,考虑到数据的完整性最终纳入6 727例体检者数据(包含调查问卷、体格测量和实验室检测3部分内容),其中调查问卷内容包括一般人口学资料,体格测量指标包括身高、体质量、腰围等,实验室检测指标包括血液、血糖、血生化等。将数据集按3∶1分为训练集和测试集,在训练集中分别应用多因素Logistic回归和随机森林算法建立糖尿病风险预测模型,用测试集进行模型验证,通过预测一致率和受试者工作特征曲线下面积(AUC)评价模型的预测效能。结果在本次体检的6 727例体检者中,既往糖尿病患者和新检测出糖尿病患者717例,糖尿病患病率为10.7%。糖尿病患者中65岁及以上者占37.1%(266/717),女性占51.0%(366/717),汉族占94.0%(674/717),初中学历者占35.3%(253/717),超重者占48.0%(344/717),从不吸烟者占72.8%(522/717),从不饮酒者占77.0%(552/717)。采用多因素Logistic回归分析在训练集建立糖尿病风险预测模型对测试集进行预测,其灵敏度为0.202,特异度为0.950,预测一致率为0.696,约登指数为0.151,AUC为0.685;采用随机森林算法在训练集建立糖尿病风险预测模型对测试集进行预测,其灵敏度为0.608,特异度为0.953,预测一致率为0.864,约登指数为0.561,AUC为0.702。结论随机森林算法对体检人群的糖尿病患病风险具有较高的预测效能,但是多因素Logistic回归分析对糖尿病影响因素有直观的解释。建议在实际应用中结合两个模型的优点,使其在疾病风险预测中发挥最大的价值。 Background China has 114 million people with diabetes,becoming the country with the largest number of diabetic patients all over the world in 2017.Early identification and effective intervention on high-risk population of diabetes can reduce the risk of diabetes mellitus.Objective To explore the application of random forest algorithm in predicting diabetes mellitus risk in people underwent physical examination.Methods We used the national health examination data of people at the age of 35 to 74 years who had physical examination at community health service centers in Shiyouxincun and Kaziwan,Urumqi from September 2016 to March 2017.Considering of integrity of the data,the data of 6 727 people underwent physical examination were collected(data from questionnaires,physical measurements and laboratory tests).The contents of the questionnaire included general demographic data,physical measurements involved height,body mass,and waist circumference and laboratory tests were blood,blood glucose,and serum chemistry indicators.Dataset was divided into a training set and a test set by a ratio of 3∶1.Multivariate Logistic regression analysis and random forest algorithm was used to establish diabetes risk prediction models in the training set and model validation was done with the test set.The prediction efficiency of the model was evaluated by the predicting consistency rate and area under the receiver operating characteristic(ROC) curve.Results There were 717 cases with diabetes or newly diagnosed with diabetes in 6 727 participants and the prevalence of diabetes mellitus was 10.7%.Among the diabetic patients,the proportion of cases at 65 years and above was 37.1%(266/717);women were 51.0%(366/717);Han Chinese was 94.0%(674/717);people with education level of junior high school was 35.3%(253/717);overweight was 48.0%(344/717);non-smokers was 72.8%(522/717);nondrinkers was 77.0%(552/717).Multivariate Logistic regression analysis was used to predict the test set of the diabetes risk prediction model established in the training set.The sensitivity was 0.202,the specificity was 0.950,and the prediction consistency rate was 0.696;the Yoden index was 0.151,and the area under the ROC curve(AUC) was 0.685.Random forest algorithm was applied to predict the test set of the diabetes risk prediction model established in the training set.The sensitivity was 0.608,the specificity was 0.953,the prediction rate was 0.864,the Yoden index was 0.561,and the AUC was 0.702.Conclusion Random forest algorithm has a higher predictive effect on the risk of diabetes for people had physical examination,but multivariate Logistic regression analysis has an intuitive explanation for the influencing factors of diabetes mellitus.We recommend to combine the advantages of the two models in practical applications to maximize their value in disease risk prediction.
作者 张占林 孙勇 妥小青 叶勒丹.马汉 龚政 田恬 陈珍 古丽斯亚.海力力 戴江红 姚华 ZHANG Zhanlin;SUN Yong;TUO Xiaoqing;YELEDAN·Mahan;GONG Zheng;TIAN Tian;CHEN Zhen;GULISIYA·Hailili;DAI Jianghong;YAO Hua(School of Public Health,Xinjiang Medical University,Urumqi 830011,China;Health Management Center,Xinjiang Medical University First Affiliated Hospital,Urumqi 830011,China;Xinjiang Medical University First Affiliated Hospital,Urumqi 830011,China)
出处 《中国全科医学》 CAS 北大核心 2019年第9期1021-1026,共6页 Chinese General Practice
基金 新疆维吾尔自治区自然科学基金资助项目(2017D01C425)
关键词 糖尿病 患病率 随机森林 预测 Diabetes mellitus Prevalence Random forest Forecasting
  • 相关文献

参考文献3

二级参考文献30

  • 1向芳,张一英,邵月琴.危险因素记分法筛查无症状糖尿病患者分析[J].健康教育与健康促进,2012,7(2):141-143. 被引量:1
  • 2黎衍云,李锐,张胜年.无症状糖尿病不同筛查方法效果评价[J].中国公共卫生,2006,22(6):687-689. 被引量:23
  • 3曹卫华,李福玲,郭英,惠宗光,李琰琳.糖尿病风险评分与糖调节受损的相关性研究[J].护理研究(上旬版),2007,21(8):1995-1997. 被引量:6
  • 4Lindstrom J, Tuomilehto J. The diabetes risk score: a practical tool to predict type 2 diabetes risk [J]. Diabetes Care, 2003, 26 (3) : 725 - 731.
  • 5Glumer C, Carstensen B, Sandbaek A, et al. A Danish diabetes risk score for targeted screening: the Inter99 study [ J ]. Diabetes Care, 2004, 27 (3) : 727 - 733.
  • 6Saaristo T, Peltonen M, Lindstrom J, et al. Cross - sectional evaluation of the Finnish Diabetes Risk Score: a tool to identify undetected type 2 diabetes, abnormal glucose tolerance and metabolic syndrome [J]. Diab and Vase Dis Res, 2005, 2 (2) : 67 -72.
  • 7Heikes KE, Eddy DM, Arondekar B, et al. Diabetes Risk Calculator: a simple tool for detecting undiagnosed diabetes and pre -diabetes [J]. Diabetes Care, 2008, 31 (5): 1040-1045.
  • 8Chen L, Magliano D J, Balkau B, ctal. AUSDRISK: an Australian Type 2 Diabetes risk Assessment Tool based on demographic, lifestyle and simple anthropometrie measures [J]. Med J Aust, 2010, 192 (4) : 197 -202.
  • 9Nilsen V, Bakke PS, Gallefoss F. Effects of lifestyle intervention in persons at risk for type 2 diabetes mellitus - results from a randomized, controlled trial [J]. BMC Public Health, 2011, 11:893 -901.
  • 10Lu C, Sun W. Prevalence of diabetes in Chinese adults [ J]. JAMA, 2014, 311 (2): 199-200.

共引文献37

同被引文献261

引证文献33

二级引证文献206

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部