This paper proposes a novel data indexing scheme,the distributed access pattern R-tree(DAPR-tree),for spatial data retrieval in a distributed computing environment.As compared to traditional distributed indexing schem...This paper proposes a novel data indexing scheme,the distributed access pattern R-tree(DAPR-tree),for spatial data retrieval in a distributed computing environment.As compared to traditional distributed indexing schemes,the DAPR-tree introduces the data access patterns during the indexing utilization stage so that a more balanced indexing structure can be provided for spatial applications(e.g.Digital Earth data warehouse).In this new indexing scheme,(a)an indexing penalty matrix is proposed by considering the balance of data number,topology and access load between different indexing nodes;(b)an‘access possibility’element is integrated to a classic‘Master-Client’structure for a distributed indexing environment;and(c)indexing algorithm for the DAPR-tree is provided for index implementations.By using a duplication of official GEOSS Clearinghouse system as a case study,the DAPR-tree was evaluated in a number of scenarios.The results show that our indexing schemes generally outperform(around 9%)traditional distributed indices with the utilization of data access patterns.Finally,we discuss the applicability of the DARP-tree and document DARP-tree shortcomings to encourage researchers pursuing related topics in Big Data indexing for Digital Earth and other geospatial initiatives.展开更多
基金funded by the National Key R&D Program of China[grant number 2018YFB2100704]Science,Technology and Innovation Commission of Shenzhen Municipality[grant numbers JCYJ20170412142239369,JCYJ20170818101704025]the National Natural Science Foundation of China[grant numbers 41701444,71961137003,41971341].
文摘This paper proposes a novel data indexing scheme,the distributed access pattern R-tree(DAPR-tree),for spatial data retrieval in a distributed computing environment.As compared to traditional distributed indexing schemes,the DAPR-tree introduces the data access patterns during the indexing utilization stage so that a more balanced indexing structure can be provided for spatial applications(e.g.Digital Earth data warehouse).In this new indexing scheme,(a)an indexing penalty matrix is proposed by considering the balance of data number,topology and access load between different indexing nodes;(b)an‘access possibility’element is integrated to a classic‘Master-Client’structure for a distributed indexing environment;and(c)indexing algorithm for the DAPR-tree is provided for index implementations.By using a duplication of official GEOSS Clearinghouse system as a case study,the DAPR-tree was evaluated in a number of scenarios.The results show that our indexing schemes generally outperform(around 9%)traditional distributed indices with the utilization of data access patterns.Finally,we discuss the applicability of the DARP-tree and document DARP-tree shortcomings to encourage researchers pursuing related topics in Big Data indexing for Digital Earth and other geospatial initiatives.