摘要
传统单机转换工具与基于范围分区方案的并行转换算法存在扩展性差、数据倾斜的问题,为此提出两步解码式空间矢量数据(SVD)并行转换算法.通过归纳地理空间数据库(GDB)中空间矢量数据的存储编码模式,构建优化后的几何解码函数作为基础工具.初次解码:仅解析空间元数据,根据几何复杂度平衡解析任务,提高解析与数据量的均衡度;二次解码:借助几何并行解析机制提取、解析压缩几何字节,提高转换效率.该算法基于Spark实现,将其与ArcGIS单机转换工具、基于范围分区方案的并行查询转换算法进行对比可知,所提算法具有显著的效率、性能扩展优势,转换效率提升了2.5~117倍,大幅降低了几何复杂度不均导致的数据倾斜情况.
In view of the poor scalability and data skew in traditional single-machine conversion tools and RangePartitioner-based parallel methods,A spatial vector data(SVD)parallel conversion was proposed based on two-step decoding.An optimized geometry-parsing algorithm was introduced as a basic decoding tool with the storage schema of SVD in geospatial database(GDB).Only the spatial metadata was parsed in the first-step decoding,and the task was balanced according to the set geometry complexity to improve the balance between parsing and data.In the later-step decoding,the compressed geometry bytes were extracted and parsed with the geometric parallel parsing mechanism,to improve the conversion efficiency.This algorithm was implemented on Apache Spark,which was compared with ArcGIS conversion tool and the RangePartitioner-based parallel query transform algorithm.The experimental results verify that the proposed algorithm has significant advantages in efficiency and performance expansion;the conversion efficiency is promoted by 2.5−117 times;and the data skew caused by uneven geometric complexity is greatly reduced.
作者
孙乐乐
金宝轩
SUN Le-le;JIN Bao-xuan(College of Tourism and Geography Science,Yunnan Normal University,Kunming 650500,China;Yunnan Provincial Department of Natural Resources,Kunming 650224,China)
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2020年第9期1768-1776,1804,共10页
Journal of Zhejiang University:Engineering Science
基金
国家自然科学基金资助项目(41661086).