期刊文献+

FMM算法中PP问题在GPU上的研究与实现 被引量:2

Research and implementation of PP problem of FMM algorithm on GPU
下载PDF
导出
摘要 针对目前快速多极子算法中PP问题在图形处理器上实现的缺点,如负载不平衡和计算规模受显存大小的限制等,提出了一种新的基于统一计算设备架构平台的实现方法。采取以Box为并行单位、在内存中开辟缓冲区与多线程流水计算等方式,使其适合于CPU和GPU组成的异构体系结构,充分利用CUDA编程模型的高并行性加速PP问题。实验结果表明,采用CUDA加速后,PP问题的计算时间明显降低,提高了整个FMM模拟效率,适合于各种多体问题的实时模拟。 For the shortcomings of many current implementation of PP problem in fast multipole method in GPU,such as,load imbalance and the computational scale restricted by the size of video memory,a new method is presented based on CUDA computing platform.In order to suit to the heterogeneous architecture built up by CPU and GPU,paralleling data in Box,opening buffer memory and pipeline on multi-thread and other method are taken to take full advantage of the parallelism with CUDA programming model to accelerate the PP problem.Experiments prove that the simulation using CUDA to accelerate the process of PP problem significantly decreased the consumed time,and the whole FMM simulation significantly increased the efficiency,and is suitable for various kinds real-time simulation in N-body problem.
出处 《计算机工程与设计》 CSCD 北大核心 2011年第9期3050-3053,3169,共5页 Computer Engineering and Design
基金 上海市重点学科建设基金项目(J50103)
关键词 图形处理器 异构体系结构 统一计算架构 快速多极子算法 PP问题 GPU heterogeneous architecture CUDA FMM PP problem
  • 相关文献

参考文献14

  • 1Florin Diacu.The solution of the n-body problem[J]. Mathematical Intelligencer, 1996,18(3):66-70.
  • 2Guy Blelloch, Girija Narlikar.A practical comparison of N-body algorithms[C]. American:Parallel Algorithms,Series in Discrete Mathematics and Theoretical Computer Science,1997.
  • 3Barnes J,Hut P.A hierarchical O(N log N) force-calculation algorithm[J].Nature, 1986,324(6096):446-449.
  • 4Greengard L,Rokhlin V.A fast algorithm for particle simulations [J].Journal of Computational Physics,1987,73(2):325-348.
  • 5Simon Portegies Zwart,Robert Belleman,Peter Geldof.High performance direct gravitational N-body simulations on graphics processing unit I:an implementation in Cg[J].New Astronomy, 2007,12(8):641-650.http://arxiv.org/abs/astro-ph/0702058.
  • 6Tsuyoshi Hamada, Toshiaki Iitaka. The chamomile scheme: An optimized algorithm for N-body simulations on programmable graphics processing its[DB/OL], http: //arxiv.org/abs/astro-ph/ 0703100,2007-03 -06/2010-07-20.
  • 7Lars Nyland.Fast N-body simulation with cuda[J].GPU Gems, 2007(3):677-695.
  • 8Robert G Bellemana,Jeroen Badorfa, Simon F Portegies Zwart. High performance direct gravitational N-body simulations on graphics processing units II: An implementation in CUDA [J]. New Astronomy,2008,13(2): 103-112.
  • 9Mark J Stock.Toward efficient GPU-accelerated N-body simulations[C].American:46th AIAA Aerospace Sciences Meeting and Exhibit,2008:1-13.
  • 10Nail A Gumerov, Ramani Duraiswami.Fast multipole methods on graphics processors [J]. Journal of Computational Physics, 2008,227( 18): 8290-8313.

同被引文献21

  • 1Prasanna Sundararajan. High Performance Computing Using FPGAs XILINX White Paper[ OL]. WP375,2010.
  • 2Dimond Rob, Racanière Srbastien, Pell Oliver. Accelerating Large- Scale HPC Applications Using FPGAs[ C]//IEEE 2011. Germany : Proceedings - 2011 20th Symposium on Computer Arithmetic ,2011 : 191 - 192.
  • 3罗兴国,等.PRCA:一种高效能计算体系结构[C]//2012高效能计算机体系结构国际高端论坛,上海,2012,10.
  • 4Xilinx. Virtex-5 Family Overview. Xilinx Product Specification DS100 [OL]. http ://www. xilinx. com/2012.
  • 5Xilinx. 7 Series FPGAs Overview. Xilinx Advance Product Specification DS180[OL]. http://www. xilinx. com/2012.
  • 6John Hennessy,David Patterson. Computer Architecture: A Quantita- tive Approach[ M ]. 4th ed. Morgan Kaufmann,2006.
  • 7Zhe Zheng, Yongxin Zhu, Xu Wang, et al. Revealing Feasibility of FMM on ASIC: Efficient Implementation of N-Body Problem on FPGA [ C ]//IEEE International Conference on Computational Science and Engineering, Hang Kong. 2010 : 132 - 139.
  • 8王武,冯仰德,迟学斌.树结构在N体问题中的应用[J].计算机应用研究,2008(1):42-44. 被引量:9
  • 9余学涛,孔雪,王绪,祝永新,何卫锋,倪明,谢光伟,雷咏梅,单健晨.FMM能效分析及其ASIC可行性评估[J].计算机工程,2011,37(13):265-268. 被引量:1
  • 10唐振,张倬,柴亚辉,徐炜民.FMM算法在Cell/B.E.处理器上实现的分析与验证[J].计算机工程与科学,2011,33(8):79-83. 被引量:1

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部