期刊文献+

基于密度的线数据分组算法研究

A Line Grouping Algorithm Based on Density
原文传递
导出
摘要 目前,地理空间数据面临着由于数据量膨胀和计算量高速增长而引起算法效率低的问题,采用"分而治之"的数据分组策略提高运算效率已成为研究的热点。面向分布不均匀的线数据,本文提出了基于密度的线数据分组算法(简称LGAD)。首先,算法通过查找高密度区提取样本线段,保证了分组算法的起点落到高密区;其次,考虑线空间拓扑关系的复杂性,引用水平、垂直和夹角距离度量线段间距离,创建样本线段与其他线段的距离矩阵;最后,以距离矩阵和最优选择方法实现数据负载均衡分组。实验结果显示,对数据分组和分组后数据进行线段聚类的2个过程中,该算法体现了较好的时间优势,与串行计算相比,在分组数为2-12时,平均比率达4.3,提高了应用的响应速度,具有较好的实际意义。 Parallel computing provides a promising solution to accelerate complicated spatial data processing, which is becoming increasingly computational intense. Partitioning large datasets into workload-balanced sub- groups remains a challenge, particularly for unevenly distributed spatial data. In this study, a density-based data grouping algorithm was developed to tackle the partition problem for large line data. The algorithm includes three procedures: (1) extracting representative segment samples based on data density distribution; (2) generating a distance matrix between segment samples and the rest of the data by using three line distance measurements into calculations; (3) grouping line segments with data load balanced. Experiments show that the algorithm is able to partition large line data efficiently and evenly into equally sized sub-groups. The speed-up ratios of parallel interpo- lation save up to 65% of the execution time in comparison with consequential interpolation. A high efficiency of par- allel computing was achieved when the datasets were divided into an optimal number of child data groups.
出处 《地球信息科学学报》 CSCD 北大核心 2015年第5期538-546,共9页 Journal of Geo-information Science
基金 海洋公益性专项项目"海洋环境信息云计算与云服务体系框架应用研究"(201105033) 海洋预报综合信息系统(Mi FSIS)研究应用项目(201105017)
关键词 分而治之 并行计算 分布不均匀 线数据分组 负载均衡 divide and rule parallel computing unevenly distributed data line segments balanced workloads
  • 相关文献

参考文献30

  • 1Guan Q, Clarke K C. A general-purpose parallel raster processing programming library test application using a geographic cellular automata model[J]. International Jour-nal of Geographical Information Science, 2010,24(5):695- 722.
  • 2Farber R. CUDA application design and development[M]. Waltham, MA: Elsevier Inc, 2011:18-19.
  • 3Kim Y, Shim K, Kim M S, et al. DBCURE-MR: An effi- cient density-based clustering algorithm for large data us- ing MapReduce[J]. Information Systems, 2014,42:15 -35.
  • 4Li L, Xi Y. Research on clustering algorithm and its paral- lelization strategy[C]. 2011 International Conference on Computational and Information Sciences (ICCIS), 2011: 325-328.
  • 5He Y, Tan H, Luo W, et al. An efficient parallel density- based clustering algorithm using MapReduce[C]. 2011 IEEE 17th International Conference on Parallel and Dis- tributed Systems (ICPADS), 2011:473-480.
  • 6Abugov D. Oracle spatial partitioning: Best practices (an Oracle white paper)[M]. Redwood, CA: Oracle Inc, 2004.
  • 7MENG Lingkui HUANG Changqing ZHAO Chunyu LIN Zhiyong.An Improved Hilbert Curve for Parallel Spatial Data Partitioning[J].Geo-Spatial Information Science,2007,10(4):282-286. 被引量:7
  • 8Florida-San Roman L. Proposition of two layered ionic structures, with xy disorder but z-ordered, in a quasi-liq- uid system[J]. Revista mexicana de fisica, 2006,52:208- 210.
  • 9Sarwat M, Mokbel M F, Zhou X, et al. Generic and effi- cient framework for search trees on flash memory storage systems[J]. GeoInformatica, 2013,17(3):417-448.
  • 10Lee J G, Han J, Whang K Y. Trajectory clustering: A parti- tion- and-group framework[C]. Proceedings of the 2007 ACM SIGMOD international conference on Management of data, 2007:593-694.

二级参考文献24

  • 1Michael J M, Steve D, Bruce M G. Towards a HPC Framework for Integrated Processing of Geographical Data: Encapsulating the Complexity of Parallel Algorithms[J].Trans in GIS, 2000, 4(3) : 245-262
  • 2Dowers S, Gittings B M, Mineter M J. Towards a Framework for High-performance Geocomputation: Handling Vector-topology Within a Distribu- ted Service Environment [J].Computers, Environment and Urban Systems, 2000(24) : 471-486
  • 3Michael J M. A Software Framework to Create Vector-topology in Parallel GIS Operations[J].International Journal of Geographical Information Science, 2003, 17(3):203-222
  • 4Richard H, Steve D, Bruce T, et al. Parallel Processing Algorithms for GIS[M]. UK: Taylor &Francis Ltd, 1998
  • 5Nickerson B G, Gao F. Spatial Indexing of Large Volume Swath Data Sets[J]. International Journal of Geographical Information Science, 1998, 12(6) :537-559
  • 6Jonathan L. The Application of Space-filling Curves to the Storage and Retrivel of Multi-dimensional Data [D].London: University of London, 2000
  • 7Kumar A, Muhanna W A. Analysis of the Performance of Spatial Ordering Methods[J].International Journal of Geographical Information Science, 1998, 12(3) : 269-289
  • 8Bongki M, Jagadish H V, Christos F, et al. Analysis of the Clustering Properties of the Hilbert Space-Filling Curve [J].IEEE Trans on Knowledge and Data Engineering, 2001,13(1) : 124-141
  • 9Wan L,Li Y,Liu W,et al.Application and study of Spatial Clus-ter and Customer Partitioning. Proc.of the Fourth Interna-tional Conference on Machine Learning and Cybernetics . 2005
  • 10Tanser F,,Barnighausen T,Cooke G S,et al.Localized spatialclustering of HI V infections in a widely disseminated ruralSouth African epidemic. Int.Journal of Epidemiology . 2009

共引文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部