摘要
为实现大规模矢量数据的高性能处理,在开源项目Hadoop基础上,设计与开发了一个基于MapReduce的矢量数据分布式计算系统。根据矢量空间数据的特点,通过分析Key/Value数据模型及GeoJSON地理数据编码格式,构建了可存储于Hadoop hdfs的矢量数据Key/Value文本文件格式;探讨矢量数据的MapReduce计算过程,对Map数据分片、并行处理过程及Reduce结果合并等关键步骤进行了详细阐述;基于上述技术,建立了矢量数据分布式计算原型系统,详细介绍系统组成,并将其应用于处理关中地区1∶10万土地利用矢量空间数据,取得较好效果。
The paper designs a vector spatial data distributed computing system based on Open Source Hadoop Projects, in or- der to satisfy the needs of massive vector data. According to the characteristics of the vector spatial data, Key/Value data model and GeoJSON data format, the paper brings forward a distributed Key/Value storage method for vector spatial data based on HDFS. The key techniques on how to computing large-scale vector spatial data based on MapReduce are elaborated in detail, in- cluding data partitioning and parallel processing mechanism of Map step, results merging of Reduce step. A vector spatial data distributed computing prototype system is developed using Open Source Hadoop projects and applied to deal with the 1 : 100, 000 land use data of Guanzhong area in China. The evaluation result indicates that the Hadoop MapReduce can significantly leverage the performance of vector spatial data analysis, especially when more computing nodes are used.
出处
《计算机工程与应用》
CSCD
2013年第16期25-29,共5页
Computer Engineering and Applications
基金
国家自然科学基金(No.41101364)
中国科学院地理科学与资源研究所"一三五"战略科技计划项目(No.2012ZD010)