摘要
随着自然灾害应急管理信息化的发展,越来越多的国家部门、行业单位、社会机构以万维网为载体提供涉灾信息数据服务。针对万维网上的涉灾信息数据分布零散和不易提取的问题,文中提出了一种涉灾信息数据采集方法,将全量采集和增量采集相结合以获取历史数据和实时数据,并整合动态页面获取技术和模拟登录技术以适用于多种网页类型。对综合性较强的网站设计了信息主题相关性判断方法,以便更准确地提取涉灾信息。通过实验,该方法能较好实现涉灾数据和涉灾信息的获取,为防灾减灾、应急管理提供数据支持。
With the development of natural disaster emergency management information,more and more national departments,industrial units and social institutions provide disaster-related information and data services with the World Wide Web as the carrier.Aiming at the problems of scattered distribution and difficult extraction of disaster information data on the World Wide Web,this paper proposes a disaster information data aquisition method,which combines full acquisition and incremental acquisition technology to collect historical data and real-time data,and integrates dynamic page acquisition technology and simulated login technology to apply to a variety of web pages.This article designs a method for judging the relevance of information topics for comprehensive websites,in order to extract disaster-related information more accurately.Through experiments,this method can achieve data integration and information integration better,and provide data support for disaster prevention,mitigation,and emergency management.
作者
邓雨婷
胡卓玮
胡一奇
DENG Yuting;HU Zhuowei;HU Yiqi(College of Resources Environment and Tourism,Capital Normal University,Beijing 100048,China;Beijing Key Laboratory of Resource Environment and Geographic Information System,Capital Normal University,Beijing 100048,China;State Key Laboratory Incubation Base of Urban Environmental Processes and Digital Simulation,Capital Normal University,Beijing 100048,China)
出处
《自然灾害学报》
CSCD
北大核心
2022年第5期31-36,共6页
Journal of Natural Disasters
基金
国家重点研发计划项目(2018YFC1508902,2017YFC0506501)。
关键词
涉灾信息数据
数据采集
多源数据
网络爬虫
防灾减灾
disaster information data
data acquisition
multi-source data
web crawler
disaster prevention and mitigation