摘要
目的比较随机森林模型和Logistic回归模型在体检人群高尿酸血症预测中的应用效果。方法选取2754例体检个体作为研究对象,运用随机森林模型和Logistic回归模型建立高尿酸血症预测模型,采用受试者工作特征曲线下面积评价两种模型的预测效能。结果随机森林模型特征变量的重要性分析结果显示,排名前5位的变量依次是血肌酐、三酰甘油、腰围、体质指数、尿素氮;随机森林预测模型的曲线下面积为0.759(95%CI:0.746~0.772),灵敏度为97.2%,特异度为54.5%。Logistic回归分析结果显示,性别、腰围、体质指数、三酰甘油、血肌酐是高尿酸血症发生的影响因素(均P<0.05);Logistic回归预测模型的曲线下面积为0.658(95%CI:0.647~0.669),灵敏度为87.7%,特异度为43.9%。随机森林预测模型曲线下面积优于Logistic回归模型(P<0.05)。结论Logistic回归模型可直观解释变量对疾病发生的风险度;而随机森林模型对高尿酸血症预测效果较好,可获得各个因素的重要性评分,可以作为Logistic回归预测模型的补充。
Objective To compare the application efficacy of random forest model and Logistic regression model for predicting hyperuricemia in physical examination population.Methods A total of 2754 individuals undergoing physical examination were selected as subjects,the hyperuricemia prediction model was established by using the random forest model and Logistic regression model,and the predictive efficacy of the two models was evaluated by the area under the receiver operating characteristic curve.Results Importance analysis of characteristic variables in the random forest model revealed that the top five variables were serum creatinine,TG,waist circumference,body mass index,and urea nitrogen successively;the area under the curve of the random forest prediction model was 0.759(95%CI:0.746-0.772)with a sensitivity of 97.2%and a specificity of 54.5%.Logistic regression analysis results showed that gender,waist circumference,body mass index,TG and serum creatinine were the influencing factors of developing hyperuricemia(all P<0.05);the area under the curve of the Logistic regression prediction model was 0.658(95%CI:0.647-0.669)with a sensitivity of 87.7%and a specificity of 43.9%.The random forest prediction model had superior area under the curve than the Logistic regression model(P<0.05).Conclusion Logistic regression model can intuitively explain the risk of disease occurrence due to variables;the random forest model has a preferable efficacy in predicting hyperuricemia and obtains the importance score of each factor,and can be a supplement to the Logistic regression prediction model.
作者
梁冰倩
黄志碧
赖银娟
莫海娟
陆华媛
陈青云
LIANG Bing-qian;HUANG Zhi-bi;LAI Yin-juan;MO Hai-juan;LU Hua-yuan;CHEN Qing-yun(School of Public Health,Guangxi Medical University,Nanning 530021,China;Medical Examination Center,the First Affiliated Hospital of Guangxi Medical University,Nanning 530021,China)
出处
《广西医学》
CAS
2020年第6期729-733,共5页
Guangxi Medical Journal