摘要
目的利用多源互联网数据构建时序分析融合工具,继而精准预测重庆市肝炎的发病趋势。方法利用卫生疾控中心数据库获取肝炎发病率数据,大气污染物数据来源于中国环境监测总站官网,气候数据来源于国家气象星系中心,网络指数数据来源于百度搜索引擎,时间范围均为2013年11月至2023年5月。基于现有的时序分析方法,利用多源数据对分解模型的残差部分进行校正。基于非自回归(non-autoregressive,NAR)和长短期记忆递归神经网络(long short term memory,LSTM)的各自优势,构建了时滞输入神经网络(delayed input neural network,DINN)。之后,还在其基础上加入了星雀优化算法(nutcracker optimizer algorithm,NOA)和联合四分位-Huber损失函数(joint quantile Huber loss,JQHL)等优化模块,继而构建了DINN+。结果相较于常见的单输入模型及同步多输入模型,DINN可取得最为优异的预测效果。在加入超参数和损失函数优化后,DINN+的预测性能进一步提升,其测试集MSE为0.1709、MAE为0.4612、RMSE为0.5821、MAPE为0.0626、R 2为0.8840。结论基于多样方法和多元数据融合的思想,在既往的时序分析方法基础上,本文提出了一个准确性和泛化能力良好的DINN+优化模型。该模型丰富和补充了利用多源数据校准传染病时序预测分析的方法学研究内容,可作为未来传染病公共卫生层面影响因素分析及趋势预测的全新基准。
Objective To construct a time series analysis fusion tool using multisource internet data and then accurately predict the incidence trend of hepatitis in Chongqing.Methods The incidence rate of hepatitis were obtained from the database of the Centre for Health and Disease Control.Air pollutant data were obtained from the official website of the China Environmental Monitoring Station,climate data were obtained from the National Meteorological Galaxy Center,and network index data were obtained through Baidu search engine.The time duration was from November 2013 to May 2023.Based on existing time series analysis methods,multisource data were used to correct the residual part of the decomposition model.A delayed input neural network(DINN)was constructed based on the respective advantages of non autoregressive(NAR)and long short-term memory(LSTM)recurrent neural networks.Afterwards,optimization modules such as the Nutcracker Optimization Algorithm(NOA)and Joint Quantile Huber Loss(JQHL)were added to the foundation,and then DINN+was constructed.Results Compared to common single-input models and synchronous multi-input models,DINN achieved the best prediction performance.After adding hyperparameters and loss function optimization,the predictive performance of DINN+was further improved,with a mean-square error(MSE)of 0.1709,a mean absolute error(MAE)of 0.4612,a root-mean-square error(RMSE)of 0.5821,a mean absolute percentage error(MAPE)of 0.0626,and a R-square(R 2)of 0.8840 in a testing set.Conclusion Based on the ideas of diverse methods and multidimensional data fusion,we propose a DINN+optimization model with good accuracy and generalization ability on the basis of previous time series analysis.This model enriches and supplements the methodological research content of using multisource data to calibrate infectious disease time series prediction analysis and can serve as a new benchmark for future analysis of influencing factors and trend prediction of infectious disease public health.
作者
姚田华
陈锡程
伍亚舟
YAO Tianhua;CHEN Xicheng;WU Yazhou(Department of Health Statistics,Faculty of Military Preventive Medicine,Army Medical University(Third Military Medical University),Chongqing,400038,China)
出处
《陆军军医大学学报》
CAS
CSCD
北大核心
2024年第12期1447-1456,共10页
Journal of Army Medical University
基金
国家自然科学基金面上项目(82173621,81872716)
青年科学基金项目(82304249)。
关键词
时序分析
发病趋势
LSTM
神经网络
元启发式算法
time series analysis
incidence trend
long short term memory
neural network
meta-heuristic algorithm