摘要
【目的】利用机器学习对循环系统疾病死亡构建预测预警模型及评价,为疾病的预防提供参考。【方法】使用中国某地区2014-2018年循环系统疾病死亡数据进行分析,采用GAM、RF、XGBoost构建预测模型。分布滞后非线性模型计算累积滞后效应结果构建预警模型,进行模型评价。【结果】累积滞后效应发现持续低温高温、高日照时数、高环境污染物浓度会增加循环系统疾病死亡风险,累计7天的相对危险度分别为1.236、1.130、1.560、1.062、1.218、1.153、1.796。RF、XGBoost模型RMSE为4.979、5.341,性能较好。年龄、性别、气温、日照时数、SO_(2)、NO_(2)、CO、O_(3)、PM_(10)、PM_(2.5)浓度为筛选出的特征变量,将累积滞后效应筛选后的数据确定预警值的预警效果较好。XGBoost预测结果的灵敏度、特异度、曲线下面积分别为0.948、0.939、0.941。【局限】缺乏伴随疾病独立数据和疾病发展进程数据。【结论】该地区死亡数增加与高年龄、男性、温度、日照时数、污染物浓度的升高有关。利用XGBoost模型构建的预测预警模型性能好,可为相关部门疾病的预防和干预提供参考价值。
[Objective]This paper builds a prediction and early warning model for circulatory system disease death,aiming to improve disease prevention.[Methods]We retrieved the death data of circulatory system diseases in a Chinese region from 2014 to 2018,and constructed the prediction model with GAM,RF and XGBoost.Then,we used the distributed lag nonlinear model to calculate the accumulative lag effect results,and built the early warning model.[Results]The continuous low and high temperatures,strong sunshine hours and high concentration of environmental pollutants would increase the risk of death from circulatory system diseases.The accumulative weekly relative risks were 1.236,1.130,1.560,1.062,1.218,1.153 and 1.796 respectively.The RMSE of the RF and XGBoost models were 4.979 and 5.341 with good performance.Age,sex,temperature,sunshine hours,SO_(2),NO_(2),CO,O_(3),PM_(10),PM_(2.5)concentration are the characteristic variables,and the early warning value was determined from the data of accumulative lag effects.The early warning effect is good.The sensitivity,specificity and area under the curve of the XGBoost prediction results were 0.948,0.939 and 0.941 respectively.[Limitations]We need to add data on concomitant diseases and their progress.[Conclusions]The regional number of deaths is related to the increase of age,men,temperature,sunshine hours and pollutant concentration.The new prediction and early warning model could benefit disease prevention and intervention.
作者
王琰
胥美美
童俞嘉
苟欢
蔡荣
单治易
安新颖
Wang Yan;Xu Meimei;Tong Yujia;Gou Huan;Cai Rong;Shan Zhiyi;An Xinying(Institute of Medical Information,Chinese Academy of Medical Sciences/Peking Union Medical College,Beijing 100020,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第10期79-92,共14页
Data Analysis and Knowledge Discovery
基金
中国医学科学院医学与健康科技创新工程项目(项目编号:2021-I2M-1-033)的研究成果之一。