摘要
目的利用居民健康大数据预测高血压的患病风险,并分析高血压相关的重要因素。方法基于社区公共卫生系统数据集,利用机器学习中的Logistic回归、随机森林和支持向量机算法建立高血压患病风险预测模型,并比较三者的预测性能,另通过随机森林中的基尼系数下降法分析高血压患病的影响因素。结果支持向量机模型的准确率(87.00%)、精确率(85.00%)、召回率(88.00%)、F1值(0.88)和ROC曲线下面积(0.932)优于随机森林模型(85.00%、84.00%、87.00%、0.87和0.929)和Logistic回归模型(83.00%、85.00%、81.00%、0.81和0.920)。Gini系数分析显示,冠心病、年龄、糖尿病和教育水平在预测高血压患病风险中具有重要作用;现教育水平、职业类型、其他慢病、婚姻情况、体重指数、父亲患有高血压、母亲患有高血压、饮酒、饮食偏咸、吸烟、锻炼在预测高血压患病风险中具有一般作用;性别、饮食偏素、饮食偏甜、饮食偏油、饮食偏辣在预测高血压患病风险中作用不大。结论支持向量机预测模型的预测高血压患病风险最优。文化程度低、合并患有冠心病、糖尿病和其他慢病、有家族史和老年人为高血压易患人群,针对此类人群应重点关注体重指数、饮酒和饮食习惯(偏咸)方面。
Objective To predict the risk of hypertension by using big data of residents'health and analyze the important factors related to hypertension.Methods Based on the data set of community public health system,using Logistic regression,random forest,and support vector machine algorithms in machine learning to establish a prediction model for the risk of hypertension,and compare the prediction performance of the three models;In addition,the influencing factors of hypertension were analyzed by Gini coefficient decline method in random forest.Results SVM model's accuracy(87.00%),accuracy(85.00%),recall(88.00%),F1 value(0.88),and area under the ROC curve(0.932)are better than the random forest model(85.00%,84.00%,87.00%,0.87,and 0.929)and Logistic regression models(83.00%,85.00%,81.00%,0.81,and 0.920).Coronary heart disease,age,diabetes,and education level play an important role in predicting the risk of hypertension;current education level,occupation type,other chronic diseases,marital status,body mass index,father with hypertension,mother with hypertension,drinking,eating a salty diet,smoking,and exercising have a general role in predicting the risk of hypertension.Gender,diet,vegan,sweet,oil,and spicy diets have little effect on predicting the risk of hypertension.Conclusion The support vector machine prediction model is the best predictor of the risk of hypertension.People with low education level,co-existing coronary heart disease,diabetes and other chronic diseases,family history,and the elderly are susceptible to hypertension.Targeting this group of people should focus on body mass index,drinking,and eating habits(salty).
作者
周阳
王妮
黄艳群
陈晨
李盛俊
陈卉
ZHOU Yang;WANG Ni;HUANG Yan-qun;CHEN Chen;LI Sheng-jun;CHEN Hui(Grade 2017 Major in Hearing and Speech Rehabilitation,Capital Medical University,Beijing 100069,China;School of Biomedical Engineering,Capital Medical University,Beijing 100069,China;Beijing Key Laboratory of Basic Research in Applied Clinical Biomechanics,Capital Medical University,Beijing 100069,China)
出处
《医学信息》
2020年第6期1-4,12,共5页
Journal of Medical Information
基金
国家自然科学基金项目(编号:81971707)。
关键词
高血压
机器学习
社区居民健康档案
基尼系数下降法
Hypertension
Machine learning
Community health records
Gini coefficient decline method