摘要
目的:利用逻辑回归分析识别冠心病发作的危险因素,使用常见机器学习算法构建冠心病风险预测模型,为冠心病的早期预防与筛查提供理论参考。方法:通过对Kaggle发布的冠心病数据进行预处理和特征筛选后进行逻辑回归分析识别主要危险因素,选用逻辑回归、支持向量机、线性判别分析、决策树和随机森林5种常见机器学习算法进行冠心病发病预测。结果:性别、年龄、平均每日吸烟量、总胆固醇水平、收缩压和血糖水平是10年内冠心病发作的主要危险因素;选用的5种机器学习算法准确率与稳定性良好;与基于统计的线性判别分析相比,决策树与随机森林并未表现出明显的优越性。结论:机器学习技术适用于冠心病发作风险的预测,能够为冠心病的防控提供参考依据。
Objective To identify the risk factors for coronary heart disease by logistic regression analysis and to establish their prediction model using the common machine learning algorithms in order to provide reference for the early prevention and diagnosis of coronary heart disease.Methods The risk factors for coronary heart disease were identified by pre-processing the Kaggle-covered data on coronary heart disease and analyzed by logistic regression analysis.The onset of coronary heart disease was predicted using the 5 common machine learning algorithms respectively(logistic regression analysis,support vector machine,linear discrimination analysis,decision tree and random forest).Results Gender,age,average number of daily smoked cigarettes,total cholesterol level,systolic blood pressure and blood glucose level were the risk factors for coronary heart disease.The accuracy and stability of the 5 common machine learning algorithms were good.Decision tree and random forest were significantly advantageous over the linear discrimination analysis in identifying the risk factors for coronary heart disease.Conclusion Machine learning can predict the onset of coronary heart disease and provide reference for its prevention and control.
作者
李婕
向菲
LI Jie;XIANG Fei(Central China University of Science and Technology Tongji Medical College Medical and Health Management School,Wuhan 430030,Hubei Province,China)
出处
《中华医学图书情报杂志》
CAS
2020年第6期7-13,共7页
Chinese Journal of Medical Library and Information Science
关键词
冠心病
风险预测模型
多因素逻辑回归分析
机器学习
随机森林
Coronary heart disease
Risk prediction model
multivariate logistic regression analysis
Machine learning
Random forest