期刊文献+

基于随机森林算法的肺癌影响因素分析及预测模型构建 被引量:1

Construction of a prediction model and validation evaluation of lung cancer influencing factors based on random forest algorithm
下载PDF
导出
摘要 目的 基于随机森林算法分析肺癌的影响因素并构建预测模型,为肺癌高危人群的筛查提供临床依据和技术支持。方法 选取2022年3—12月在安徽医科大学第一附属医院普胸外科住院且完成问卷调查的153例肺癌患者作为肺癌组,采用病例对照研究设计的方法,按1:2的比例选取同期收治的306例非肿瘤患者作为对照组。采用问卷调查法调查两组的人口学特征及相关资料,通过单因素及多因素logistic回归分析肺癌的影响因素并建立预测模型,采用随机森林算法对各变量的重要性进行排序。结果 单因素分析显示,两组患者的性别、年龄、相关症状(长期气喘、反复胸痛、长期咳嗽或咳痰)、慢性气管炎史、肺癌家族史、饮食习惯(过快饮食、过饱饮食、不规律饮食)、饮酒史、吸烟史、经常熬夜、规律体育锻炼、接触史(长期接触粉尘或棉尘、长期接触煤烟)、性格易怒等情况比较差异有显著性(P<0.05)。多因素logistic回归分析显示,长期咳嗽或咳痰(OR=5.136,P<0.05)、肺癌家族史(OR=0.400,P<0.05)、过饱饮食(OR=3.814,P<0.05)、不规律饮食(OR=5.876,P<0.05)、吸烟史(OR=6.036,P<0.05)、长期接触粉尘或棉尘(OR=5.556,P<0.05)、性格易怒(OR=5.481,P<0.05)是肺癌的独立影响因素。基于logistic回归分析结果构建随机森林模型,其预测肺癌发生风险的ROC曲线下面积为0.917,敏感度为78.36%,特异度为89.10%;特征因素重要性排序由高到低依次为长期接触粉尘或棉尘、吸烟史、肺癌家族史、性格易怒、不规律饮食、过饱饮食。结论 长期接触粉尘或棉尘、吸烟史、肺癌家族史、性格易怒、不规律饮食、过饱饮食是肺癌的独立影响因素,据此构建的随机森林模型对肺癌具有较高的预测价值。 Objective To construct a prediction model based on random forest algorithm for lung cancer high-risk population,which provides experimental basis and technical support for more effective and safer screening of lung cancer high-risk population in the future.Method Between March 2022 and December 2022,153 lung cancer patients were selected from the oncology department of the hospital,and a case-control study design method was used,with case controls based on the sample size(1:2)principle and a healthy control group population from the health check-up center was selected during the same period(n=306).Questionnaires were used to investigate the personal characteristics and related information of the two groups,and the factors influencing lung cancer were analyzed by singlefactor and multi-factor analysis.A prediction model for the population at high risk of lung cancer was established,and the random forest algorithm was used to rank the importance of each variable.Result There were statistically significant differences between the two groups compared with each other in terms of gender,age,related symptoms(chronic shortness of breath,recurrent chest pain,chronic cough or sputum),history of chronic bronchitis,family history of lung cancer,dietary habits(too fast diet,too full diet,irregular diet),history of alcohol consumption,history of smoking,frequent late nights,regular physical exercise,history of exposure(chronic exposure to dust or cotton dust,chronic exposure to soot),and irritable personality(P<0.05).Long-term cough or sputum(OR=5.136,P<0.05),family history of lung cancer(OR=0.400,P<0.05),overfed diet(OR=3.814,P<0.05),irregular diet(OR=5.876,P<0.05),history of smoking(OR=6.036,P<0.05),long-term exposure to dust or cotton dust(OR=5.556,P<0.05),and irritability(OR=5.481,P<0.05)were influential factors in the development of lung cancer;the area under the ROC curve(AUC)of the random forest model constructed based on logistic regression analysis to predict the risk of lung cancer development was 0.917,with a sensitivity of 78.36%and specificity of 89.10%.The characteristic factors in order of importance were long-term exposure to dust or cotton dust,history of smoking,family history of lung cancer,irritable personality,irregular diet,and overfed diet.Conclusion Long-term exposure to dust or cotton dust,smoking history,family history of lung cancer,irritable personality,irregular diet,and overfed diet are the influencing factors of lung cancer,and the construction of random forest prediction model has high predictive value for lung cancer occurrence,which can be used as a basis for clinical formulation of targeted prevention and treatment strategies,and further provide favorable conditions for improving the physical quality of our nationals.
作者 董江燕 章靓 张静 郭文霞 骆乐 谭月霞 Dong Jiangyan;Zhang Liang;Zhang Jing;Guo Wenxia;Luo Le;Tan Yuexia(Department of General Thoracic Surgery,First Affiliated Hospital of Anhui Medical University,Anhui Hefei 230022,China)
出处 《中国医刊》 CAS 2023年第11期1188-1193,共6页 Chinese Journal of Medicine
基金 安徽省自然科学基金(1808085QH271)。
关键词 随机森林算法 肺癌 预测模型 高危人群 Random forest algorithm Lung cancer Prediction model High-risk population
  • 相关文献

参考文献12

二级参考文献119

共引文献97

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部