期刊文献+

基于OpenCL的累积汇流并行计算

Parallel computing with OpenCL for flow accumulation
下载PDF
导出
摘要 大尺度、高分辨率数字地形数据应用需求的增长,给计算密集型的累积汇流等数字地形分析算法带来了新的挑战。针对CPU/GPU(Graphics Processing Unit)异构计算平台的特点,提出了一种基于OpenCL(Open Computing Language)的多流向累积汇流算法的并行化策略,具有更好的平台独立性和可移植性,简化了CPU/GPU异构平台下的并行应用程序设计。累积汇流并行算法包括时空独立型的流量分配和空间依赖型的累积入流两个过程,均定义为OpenCL内核并交由OpenCL设备并行执行,其中累积入流过程借助流量转移矩阵由递归式转换为迭代式来实现并行计算。与基于流量转移矩阵的并行汇流算法相比,尽管基于单元入度矩阵的并行汇流算法可以降低迭代过程中的计算冗余,但需要采用具有较大延迟的原子操作以及需要更多的迭代次数,在有限的GPU计算资源下,两种算法性能差异不明显。实验结果表明,并行累积汇流算法在NVIDIA GeForce GT 650M GPU上获得了较好的加速比,加速性能随格网尺度增加而有所增加,其中流量分配获得了约50~70倍的加速比,累积入流获得了10~20倍的加速比,展示了利用OpenCL在GPU等并行计算设备上进行大规模数字地形分析的潜在优势。 The growing demand for the applications of large scale and high resolution digital terrain data has brought new challenges to computationally intensive digital terrain analysis algorithms such as flow accumulation. According to the characteristics of heterogeneous computing platform with CPU/GPU(Graphics Processing Unit), a parallelization strategy for multiple flow direction flow accumulation algorithm is put forward based on the OpenCL(Open Computing Language). It has better platform independence and portability, which simplifies the programming for parallel computing under CPU/GPU heterogeneous platform. The parallel flow accumulation algorithm includes outflow allocation process independently with the space and time domain, and the inflow accumulation process depending on the space domain. The two processes are defined as OpenCL kernels and are executed parallelly on the OpenCL devices. The transfer matrix is used to transfer the recursive inflow accumulation process into iterative style for parallel computing. Compared with the parallel flow accumulation algorithm based on flow transfer matrix, the parallel flow accumulation algorithm based on indegree matrix with graph theory can reduce the computation redundancy in the iterative inflow accumulation process, but it requires atomic operations with large delay and more iterations. With limited GPU computing resources, the two parallel flow accumulation algorithms have no obvious differences in speedup performance. Experimental results show that the parallel flow accumulation algorithm obtains a good speedup on NVIDIA GeForce GT 650M GPU and the speedup is increased gradually with the increase of grid scale. The speedups are 50~70 for the outflow allocation process and 10~20 for the inflow accumulation process, which demonstrates the potential advantages of large scale digital terrain analysis on parallel com-puting devices such as GPU with OpenCL.
出处 《计算机工程与应用》 CSCD 2014年第3期22-29,116,共9页 Computer Engineering and Applications
基金 江西省科技厅科技支撑计划项目资助(No.2010BGA00900) 江西省高等学校重点学科建设项目资助
关键词 并行计算 累积汇流 图形处理器 开放计算语言 parallel computing flow accumulation Graphics Processing Unit(GPU) Open Computing Language(OpenCL)
  • 相关文献

参考文献4

二级参考文献67

共引文献93

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部