摘要
针对目前快速多极子算法中PP问题在图形处理器上实现的缺点,如负载不平衡和计算规模受显存大小的限制等,提出了一种新的基于统一计算设备架构平台的实现方法。采取以Box为并行单位、在内存中开辟缓冲区与多线程流水计算等方式,使其适合于CPU和GPU组成的异构体系结构,充分利用CUDA编程模型的高并行性加速PP问题。实验结果表明,采用CUDA加速后,PP问题的计算时间明显降低,提高了整个FMM模拟效率,适合于各种多体问题的实时模拟。
For the shortcomings of many current implementation of PP problem in fast multipole method in GPU,such as,load imbalance and the computational scale restricted by the size of video memory,a new method is presented based on CUDA computing platform.In order to suit to the heterogeneous architecture built up by CPU and GPU,paralleling data in Box,opening buffer memory and pipeline on multi-thread and other method are taken to take full advantage of the parallelism with CUDA programming model to accelerate the PP problem.Experiments prove that the simulation using CUDA to accelerate the process of PP problem significantly decreased the consumed time,and the whole FMM simulation significantly increased the efficiency,and is suitable for various kinds real-time simulation in N-body problem.
出处
《计算机工程与设计》
CSCD
北大核心
2011年第9期3050-3053,3169,共5页
Computer Engineering and Design
基金
上海市重点学科建设基金项目(J50103)