摘要
针对海量空间数据分布式存储中存在的不顾及空间邻近性、分布不均和数据倾斜的问题,基于MapReduce并行编程模型,对Hilbert空间曲线层次分解的思想和节点容量感知的方法进行了研究,提出了一种层次分解的空间数据并行划分策略,并通过临界值判定实现空间数据的均衡存储。最后通过实例分析说明该方法可以在保证空间数据邻近特性的同时,解决海量空间数据分布式存储不均和数据倾斜的问题。
Spatial data partitioning method plays an important role in spatial data distributed storage,and its key problem is how topartition spatial data to distributed storage nodes in network environment. This paper discusses massive spatial data partitioning strategies and analyses their disadvantages which these partitioning methods have not taken into account spatial object size and spatial proximity. Aiming at these questions,this paper proposes a new spatial data parallelpartitioning strategy based on MapReduce and capacity-aware method to improve load balance which could avoid unevenly distributed data storage and data skew. Experimental analysis shows that the presented spatial data parallel partitioning algorithm not only achieves better storage load balance in distributed storage system,but also keeps well spatial locality of data objects after partitioning.
出处
《测绘通报》
CSCD
北大核心
2017年第11期96-100,共5页
Bulletin of Surveying and Mapping
基金
国家重点研发计划(2016YFB0502603)
湖北省自然科学基金(ZRY2015001543)
中国地质大学(武汉)中央高校基本科研业务费资金(1610491B20)