摘要
应用主成分估计方法,对Logistic回归模型进行参数估计,并消除多重共线性影响.首先选取了累计贡献率达到85%以上的6个主成分,对因变量进行主成分估计,然后挑选出冠心病患者发病的主要影响因素,最后得到了因变量(冠心病发病)与6个主要影响因素(血压(sbp)、累计烟草量(tobacco)、低密度脂蛋白胆固醇(ldl)、心脏病家族史(famhist)、型表现(typea)和发病年龄(age))的回归模型.根据结果可知,心脏病家族史是导致心脏病发病最大的一个原因,它是一个不可控因素;在可控因素中,累计烟草量对冠心病发病的影响最大,因此建议患者应该控制烟草摄入量,以保证病情的稳定性.
The principal component estimation method is used to estimate the parameters of the logistic regression model and to eliminate the influence of multicollinearity.First,the 6 principal components with a cumulative contribution rate of more than 85%are selected,and the principal components are estimated for the dependent variable,and then the main influencing factors of the incidence of coronary heart disease are selected.Finally,this paper obtains the dependent variable(coronary heart disease incidence)and the 6 main influencing factors:blood pressure(sbp),cumulative tobacco volume(tobacco),low-density lipoprotein cholesterol(ldl),heart disease family regression model of history(famhist),type performance(typea)and age of onset(age).According to the analysis of the results,it can be seen that the family history of heart disease is the biggest cause of heart disease,and the second influencing factor is age.Both of these influencing factors are uncontrollable factors.Among the controllable factors,the cumulative amount of tobacco has an effect on coronary heart disease.The disease has the greatest impact,so it is recommended that patients should control tobacco intake to ensure the stability of the disease.
作者
胡倩
胡尧
刘伟
HU Qian;HU Yao;LIU Wei(School of Mathematics and Statistics,Guizhou University,Guiyang,Guizhou 550025, China;Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guiyang,Guizhou 550025,China)
出处
《经济数学》
2020年第4期123-129,共7页
Journal of Quantitative Economics
基金
国家自然科学基金资助项目(11661018)
贵州省科技计划项目(黔科合平台人才[2017]5788号)。
关键词
LOGISTIC回归
多重共线性
主成分估计
冠心病
Logistic regression model
multicollinearity
principal component estimation
coronary heart disease