期刊文献+

随机森林模型和Logistic回归模型在高尿酸血症预测中的应用效果比较 被引量:7

Efficacy of random forest model and Logistic regression model applied to hyperuricemia prediction:a comparative study
下载PDF
导出
摘要 目的比较随机森林模型和Logistic回归模型在体检人群高尿酸血症预测中的应用效果。方法选取2754例体检个体作为研究对象,运用随机森林模型和Logistic回归模型建立高尿酸血症预测模型,采用受试者工作特征曲线下面积评价两种模型的预测效能。结果随机森林模型特征变量的重要性分析结果显示,排名前5位的变量依次是血肌酐、三酰甘油、腰围、体质指数、尿素氮;随机森林预测模型的曲线下面积为0.759(95%CI:0.746~0.772),灵敏度为97.2%,特异度为54.5%。Logistic回归分析结果显示,性别、腰围、体质指数、三酰甘油、血肌酐是高尿酸血症发生的影响因素(均P<0.05);Logistic回归预测模型的曲线下面积为0.658(95%CI:0.647~0.669),灵敏度为87.7%,特异度为43.9%。随机森林预测模型曲线下面积优于Logistic回归模型(P<0.05)。结论Logistic回归模型可直观解释变量对疾病发生的风险度;而随机森林模型对高尿酸血症预测效果较好,可获得各个因素的重要性评分,可以作为Logistic回归预测模型的补充。 Objective To compare the application efficacy of random forest model and Logistic regression model for predicting hyperuricemia in physical examination population.Methods A total of 2754 individuals undergoing physical examination were selected as subjects,the hyperuricemia prediction model was established by using the random forest model and Logistic regression model,and the predictive efficacy of the two models was evaluated by the area under the receiver operating characteristic curve.Results Importance analysis of characteristic variables in the random forest model revealed that the top five variables were serum creatinine,TG,waist circumference,body mass index,and urea nitrogen successively;the area under the curve of the random forest prediction model was 0.759(95%CI:0.746-0.772)with a sensitivity of 97.2%and a specificity of 54.5%.Logistic regression analysis results showed that gender,waist circumference,body mass index,TG and serum creatinine were the influencing factors of developing hyperuricemia(all P<0.05);the area under the curve of the Logistic regression prediction model was 0.658(95%CI:0.647-0.669)with a sensitivity of 87.7%and a specificity of 43.9%.The random forest prediction model had superior area under the curve than the Logistic regression model(P<0.05).Conclusion Logistic regression model can intuitively explain the risk of disease occurrence due to variables;the random forest model has a preferable efficacy in predicting hyperuricemia and obtains the importance score of each factor,and can be a supplement to the Logistic regression prediction model.
作者 梁冰倩 黄志碧 赖银娟 莫海娟 陆华媛 陈青云 LIANG Bing-qian;HUANG Zhi-bi;LAI Yin-juan;MO Hai-juan;LU Hua-yuan;CHEN Qing-yun(School of Public Health,Guangxi Medical University,Nanning 530021,China;Medical Examination Center,the First Affiliated Hospital of Guangxi Medical University,Nanning 530021,China)
出处 《广西医学》 CAS 2020年第6期729-733,共5页 Guangxi Medical Journal
关键词 高尿酸血症 随机森林模型 LOGISTIC回归模型 预测模型 Hyperuricemia Random forest model Logistic regression model Prediction model
  • 相关文献

参考文献10

二级参考文献85

  • 1杨瑛,刘献成,谢红,王爱芳,赵玉红,孙凯,张红叶.青岛港人群高尿酸血症患病率与心血管病危险因素的关系[J].中国临床康复,2005,9(43):1-3. 被引量:5
  • 2苗志敏,赵世华,王颜刚,李长贵,王忠超,陈颖,陈新焰,阎胜利.山东沿海居民高尿酸血症及痛风的流行病学调查[J].中华内分泌代谢杂志,2006,22(5):421-425. 被引量:206
  • 3武晓岩,李康.基因表达数据判别分析的随机森林方法[J].中国卫生统计,2006,23(6):491-494. 被引量:21
  • 4Marko R.Improving Random Forests.Machine Learning.ECML Proceedings,Springer,Berlin,2004.
  • 5Ramón D,Sara Alvarez DA.Gene selection and classification of microarray data using random Forest.BMC Bioinformatics,2006,http://www.biomedcentral.com/1471-2105/7/3.
  • 6Liaw A,Wiener M.Classification and regression by randomForest.Rnews,2002,2:18-22.
  • 7Leo B.Random Forests.Statistics Department University of California Berkeley,CA 94720,January 2001.
  • 8Archer KJ, Kirnes RV, 2008. Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. ,52(4):2249-2260.
  • 9Biau G, 2012. Analysis of a random forests model. J. Mach. Learn. Res. , 13: 1063 -1095.
  • 10Breiman L, 2001a. Random forests. Mach. Learn. , 45:5 - 32.

共引文献707

同被引文献90

引证文献7

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部