摘要
基于互联网数据的传染病疫情监测成为近年来传染病防治的热点研究内容。通过对2014年9月暴发的以广东省为中心的全国登革热疫情与登革热相关关键词的百度指数的关联性分析,发现地区(省、市)登革热疫情严重程度与该地区"登革热"关键词的百度指数呈很强的正相关性。为了实时地预测疫情动态,建立基于12个登革热相关关键词的百度指数的多元线性回归模型。在留一法交叉验证和反向测试中,该模型对于测试数据的预测值和实际值的皮尔森相关系数分别达到了0.89和0.73。经实验,该预测模型能够比较准确地预测登革热疫情动态,同时该研究对于基于互联网数据的传染病疫情监测和防治具有一定的指导意义。
In recent years,the internet data-based epidemics surveillance for infectious diseases has been the hot topic of studies in infectious diseases prevention and treatment. Through analysing the correlation between the dengue epidemic outbreak in September,2014 in whole China with Guangdong province as the centre and the Baidu index of the keywords correlated to dengue,we found that the severity of dengue epidemic in each province has strong positive correlation with Baidu index of keyword 'dengue ' in given province. For timely predicting dengue epidemic status,we built a multivariate linear regression model,which is based on the Baidu index of 12 dengue-correlated keywords. In both leave-one-out cross-validation and retrospective testing,the model performed well,with Pearson correlation coefficient between the predicted and actual epidemic size equalling to 0. 89 and 0. 73 respectively. It was indicated through experiment that this prediction model could be preferably accurate in predicting dengue epidemic status,at the same time our study has certain significance in terms of guidance for internet data-based surveillance,prevention and treatment of infectious diseases.
出处
《计算机应用与软件》
CSCD
2016年第7期42-46,78,共6页
Computer Applications and Software
基金
国家自然科学基金项目(31371338)
国家传染病重大专项(2013ZX10004611-002
2014ZX10004002-001)
湖南大学青年教师成长计划项目(531107040720)
湖南大学生物医学超算项目(531106011004)
关键词
百度指数
登革热
定量预测模型
Baidu index
Dengue
Quantitative prediction model