摘要
防震减灾对策研究中,区域人口数据起着至关重要的参考作用。用区域人口数据乘以一定震级下对应人口的伤亡率,可以初步判断该区域因地震导致的人员伤亡数。为减小在传统搜索引擎下人工采集数据出现的误差,提高采集海量人口数据的工作效率,以北京市人口数据为例,首先使用Xpath方式分析网页结构布局和数据分布,利用正则表达式进行数据筛选,再对网页进行多层URL爬取,直至获取到北京市社区一级的6859条数据,最后将其保存至MySQL数据库中进行持久化存储。实验结果表明,该爬虫能够有效避免人工采集数据过程中出现的数据误差,有效数据率达83.1%。数据采集过程达到高效、准确及可视化要求。
Regional population data plays an important role in the study of countermeasures for earthquake prevention and mitigation.The number of population casualties caused by earthquakes can be preliminarily judged by multiplying the population data of a region by the corresponding casualty rate under a certain earthquake magnitude.In order to improve the traditional search engine under artifi⁃cial acquisition data of data error,enhances the working efficiency of the mass population data collected at the same time,this study population data of Beijing as an example,the first to use Xpath way to analysis the structure and layout of web pages and data distribu⁃tion,and use the regular expression for data selection,multilayer on web URL crawl,until you get to Beijing to the community level 6859 data,finally save it to the MySQL database for persistent storage,valid data rate was 83.1%.Experimental results show that the crawler can effectively avoid data errors in the process of manual data collection,making the whole data collection process efficient,ac⁃curate and visual.
作者
李通
姚新强
LI Tong;YAO Xin-qiang(Emergency Management Institute,Institute of Disaster Prevention,Langfang 065201,China;Earthquake Disaster Prevention Center,Tianjin Earthquake Agency,Tianjin 300201,China)
出处
《软件导刊》
2021年第11期152-157,共6页
Software Guide