期刊文献+

大气污染对学生因呼吸系统症状缺课影响的机器学习算法应用研究

Applied research of the impact of air pollution on absenteeism in students with respiratory issues through machine learn-ing analysis
原文传递
导出
摘要 目的探讨机器学习预测模型在学生因大气污染引起呼吸系统症状缺课短期序列中的应用性能,以期为学校疾病发生的早期预警提供方法学参考。方法基于江苏省2019年9月—2022年10月学生因呼吸系统症状缺课短期序列数据,集成大气污染物平均浓度数据,结合单因素分布滞后非线性模型筛选大气污染物最优滞后变量,构建极端梯度提升(XGBoost)算法模型预测学生因呼吸系统症状缺课频数,并与季节性自回归综合移动平均外生(SARIMAX)模型进行比较。结果2019—2022年江苏省日均因呼吸系统症状缺课学生9709名,大气指标日均空气质量指数(AQI)为76.96,PM_(2.5)、PM_(10)、NO_(2)以及O_(3)的日均质量浓度分别为35.75,61.13,28.89,104.81μg/m^(3)。格兰杰因果检验显示,AQI、PM_(2.5)、PM_(10)、NO_(2)和O_(3)均是因呼吸系统症状缺课频数序列的预测因素(F值分别为1.46,1.79,1.67,3.41,2.18,P值均<0.01)。PM_(2.5)、PM_(10)、NO_(2)和O_(3)单日滞后效应RR值分别在lag4、lag0、lag0、lag4时达到峰值。结合大气污染物最优滞后变量的XGBoost模型与SARIMAX模型相比,平均绝对误差(MAE)指标由2.251降低至0.475、平均绝对百分比误差(MAPE)指标由0.429降低至0.080、均方根误差(RMSE)指标由2.582降低至0.713。预警阈值为P_(75)时,XGBoost模型与SARIMAX模型相比,灵敏度由0.086提升至0.694、特异度由0.979提升至0.988、约登指数由0.065提升至0.682。结论XGBoost模型在预测学生因大气污染引起呼吸系统症状缺课短期序列方面有较好的预测性能和预警效果。学校可适时采用该模型,及早发现疾病流行进行预警及防控,完善学校卫生工作。 Objective To explore the performance of machine learning prediction models in forecasting student absenteeism due to respiratory symptoms caused by air pollution in short term aiming to provide a methodological reference for early warning systems of school diseases.Methods Utilizing data from short-term sequences of student absenteeism due to respiratory symptoms in Jiang-su Province from September 2019 to October 2022 the study integrated average concentrations of atmospheric pollutants.A univari-ate distributed lag nonlinear model was employed to select optimal lag variables for the pollutants.An extreme gradient boosting XG-Boost algorithm model was developed to predict the frequency of absenteeism due to respiratory symptoms and compared with the seasonal autoregressive integrated moving average with exogenous factors SARIMAX model.Results Between 2019 and 2022 an average of 9709 students per day in Jiangsu Province were absent due to respiratory symptoms.The daily average air quality index AQI was 76.96 with mass concentrations of PM_(2.5) PM_(10) NO_(2) and O_(3) averaging at 35.7561.1328.89104.81μg/m^(3) re-spectively.Granger causality tests indicated that AQI PM_(2.5) PM_(10) NO_(2) and O_(3) were significant predictors of absenteeism fre-quency due to respirutory symptoms F=1.461.791.673.412.18 P<0.01.The single-day lag effects of PM_(2.5) PM_(10) NO_(2) and O_(3) reached their peak relative risk RR values at lag4 lag0 lag0 lag4 respectively.When integrating these optimal lag varia-bles for the pollutants the XGBoost model demonstrated superior predictive performance to the SARIMAX model reducing the mean absolute error MAE from 2.251 to 0.475 mean absolute percentage error MAPE from 0.429 to 0.080 and root mean square error RMSE from 2.582 to 0.713 at the P_(75) percentile alert threshold the sensitivity improved from 0.086 to 0.694 and specificity from 0.979 to 0.988 with the Youden index increasing from 0.065 to 0.682.Conclusions The XGBoost model exhibits robust predictive performance and effective early warning capabilities for short-term sequences of student absenteeism due to respira-tory symptoms caused by air pollution.Schools could timely adopt this model to preemptively detect and control disease outbreaks thereby enhancing school health management.
作者 曹承斌 杨文漪 余小金 王艳 杨婕 CAO Chengbin;YANG Wenyi;YU Xiaojin;WANG Yan;YANG Jie(School of Public Health Southeast University Nanjing 210009,Jiangsu Province China)
出处 《中国学校卫生》 CAS 北大核心 2024年第6期770-774,共5页 Chinese Journal of School Health
基金 江苏省研究生科研与实践创新项目(SJCX22_0076)。
关键词 空气污染 呼吸系统 缺勤 模型 统计学 学生 Air pollution Respiratory system Absenteeism Models.statistical Students
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部