期刊文献+

GPU加速的图像一致性形变方法并行实现

GPU accelerated implementation of consistent image transformations estimation
下载PDF
导出
摘要 针对新近提出的一种快速一致性形变方法提出一种GPU平台的并行实现策略。首先提出了一种分支优化方法,利用仿真工具获取描述每条线程行为的基本块矢量(BBV),通过最优偏移下的体数据划分,使执行路径相似的线程尽可能集中在同一线程束中,利用该方法得到的线程分配方案可以减小GPU因分支而引起的执行效率下降;分析了全局内存、纹理内存和共享内存三种存储策略在实现插值算法时的性能,选取了共享内存完成插值算法所需的数据存取,并对数据边界的插值误差进行了分析;利用规约方法有效提高了GPU的求和效率。针对三维图像进行了实验,采用分支优化策略可以提高6%的性能,共享内存的存储策略优于全局内容和纹理内存策略,同时近似插值算法带来的误差对算法收敛影响较小,规约求和可以明显提高求和效率。实验结果表明该方法在NVIDIA C2050 GPU平台上可以获得了大约110的加速比。 For a fast consistent transformation estimation method, a parallel implementation strategy on GPU platform was proposed in this paper. Firstly, a branch optimization method was proposed to deal with performance degradation of GPU when many branches existed in threads. The Basic Block Vector( BBV) was acquired by means of simulation to describe behaviors of each thread. The volume data was partitioned into blocks based on an optimal offset to make threads with similar behaviors into a thread warp as possible, which reduced the performance degradation of GPU caused by branches. Secondly, data access strategies of global memory, texture memory and shared memory were discussed to implement linear interpolation algorithm.Shared memory was employed to access data required in the interpolation algorithm, and the interpolation error caused by data partition was discussed. Finally, the reduction strategy was employed to sum all consistent errors of the volume data to improve the implementation efficiency on GPU platform. In this paper, three-dimensional images were experimented, the branch optimization method by using shared memory can improve the performance by 6%. The storage strategy of shared memory is better than the global content and texture memory strategy. And the error of approximation interpolation algorithm has less influence on algorithm convergence. The reduction sum can improve the efficiency of sum obviously. The experimental results prove that a speedup of about 110 can be obtained on the NVIDIA C2050 GPU platform though the proposed method.
出处 《计算机应用》 CSCD 北大核心 2017年第A01期49-53,57,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(U1301251)
关键词 图形处理器 并行计算 图像配准 一致性形变 加速比 Graphic Processing Unit(GPU) parallel computation image registration consistent transformation speedup
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部