FMM算法中PP问题在GPU上的研究与实现被引量：2

Research and implementation of PP problem of FMM algorithm on GPU

下载PDF

导出

摘要针对目前快速多极子算法中PP问题在图形处理器上实现的缺点,如负载不平衡和计算规模受显存大小的限制等,提出了一种新的基于统一计算设备架构平台的实现方法。采取以Box为并行单位、在内存中开辟缓冲区与多线程流水计算等方式,使其适合于CPU和GPU组成的异构体系结构,充分利用CUDA编程模型的高并行性加速PP问题。实验结果表明,采用CUDA加速后,PP问题的计算时间明显降低,提高了整个FMM模拟效率,适合于各种多体问题的实时模拟。 For the shortcomings of many current implementation of PP problem in fast multipole method in GPU,such as,load imbalance and the computational scale restricted by the size of video memory,a new method is presented based on CUDA computing platform.In order to suit to the heterogeneous architecture built up by CPU and GPU,paralleling data in Box,opening buffer memory and pipeline on multi-thread and other method are taken to take full advantage of the parallelism with CUDA programming model to accelerate the PP problem.Experiments prove that the simulation using CUDA to accelerate the process of PP problem significantly decreased the consumed time,and the whole FMM simulation significantly increased the efficiency,and is suitable for various kinds real-time simulation in N-body problem.

作者李正杰徐炜民柴亚辉郑衍衡

机构地区上海大学计算机工程与科学学院

出处《计算机工程与设计》 CSCD 北大核心 2011年第9期3050-3053,3169,共5页 Computer Engineering and Design

基金上海市重点学科建设基金项目(J50103)

关键词图形处理器异构体系结构统一计算架构快速多极子算法 PP问题 GPU heterogeneous architecture CUDA FMM PP problem

分类号 TP391.9 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1Florin Diacu.The solution of the n-body problem[J]. Mathematical Intelligencer, 1996,18(3):66-70.
2Guy Blelloch, Girija Narlikar.A practical comparison of N-body algorithms[C]. American:Parallel Algorithms,Series in Discrete Mathematics and Theoretical Computer Science,1997.
3Barnes J,Hut P.A hierarchical O(N log N) force-calculation algorithm[J].Nature, 1986,324(6096):446-449.
4Greengard L,Rokhlin V.A fast algorithm for particle simulations [J].Journal of Computational Physics,1987,73(2):325-348.
5Simon Portegies Zwart,Robert Belleman,Peter Geldof.High performance direct gravitational N-body simulations on graphics processing unit I:an implementation in Cg[J].New Astronomy, 2007,12(8):641-650.http://arxiv.org/abs/astro-ph/0702058.
6Tsuyoshi Hamada, Toshiaki Iitaka. The chamomile scheme: An optimized algorithm for N-body simulations on programmable graphics processing its[DB/OL], http: //arxiv.org/abs/astro-ph/ 0703100,2007-03 -06/2010-07-20.
7Lars Nyland.Fast N-body simulation with cuda[J].GPU Gems, 2007(3):677-695.
8Robert G Bellemana,Jeroen Badorfa, Simon F Portegies Zwart. High performance direct gravitational N-body simulations on graphics processing units II: An implementation in CUDA [J]. New Astronomy,2008,13(2): 103-112.
9Mark J Stock.Toward efficient GPU-accelerated N-body simulations[C].American:46th AIAA Aerospace Sciences Meeting and Exhibit,2008:1-13.
10Nail A Gumerov, Ramani Duraiswami.Fast multipole methods on graphics processors [J]. Journal of Computational Physics, 2008,227( 18): 8290-8313.

同被引文献21

1Prasanna Sundararajan. High Performance Computing Using FPGAs XILINX White Paper[ OL]. WP375,2010.
2Dimond Rob, Racanière Srbastien, Pell Oliver. Accelerating Large- Scale HPC Applications Using FPGAs[ C]//IEEE 2011. Germany : Proceedings - 2011 20th Symposium on Computer Arithmetic ,2011 : 191 - 192.
3罗兴国,等.PRCA:一种高效能计算体系结构[C]//2012高效能计算机体系结构国际高端论坛,上海,2012,10.
4Xilinx. Virtex-5 Family Overview. Xilinx Product Specification DS100 [OL]. http ://www. xilinx. com/2012.
5Xilinx. 7 Series FPGAs Overview. Xilinx Advance Product Specification DS180[OL]. http://www. xilinx. com/2012.
6John Hennessy,David Patterson. Computer Architecture: A Quantita- tive Approach[ M ]. 4th ed. Morgan Kaufmann,2006.
7Zhe Zheng, Yongxin Zhu, Xu Wang, et al. Revealing Feasibility of FMM on ASIC: Efficient Implementation of N-Body Problem on FPGA [ C ]//IEEE International Conference on Computational Science and Engineering, Hang Kong. 2010 : 132 - 139.
8王武,冯仰德,迟学斌.树结构在N体问题中的应用[J].计算机应用研究,2008(1):42-44. 被引量：9
9余学涛,孔雪,王绪,祝永新,何卫锋,倪明,谢光伟,雷咏梅,单健晨.FMM能效分析及其ASIC可行性评估[J].计算机工程,2011,37(13):265-268. 被引量：1
10唐振,张倬,柴亚辉,徐炜民.FMM算法在Cell/B.E.处理器上实现的分析与验证[J].计算机工程与科学,2011,33(8):79-83. 被引量：1

引证文献2

1何琪辰,沈文枫,孙思齐,徐炜民,郑衍衡.超混合深度可重构计算阵列调度策略的优化研究[J].计算机应用与软件,2014,31(6):278-281. 被引量：1
2韩承磊,梁建国,傅游,叶雨曦,花嵘,李倩倩.神威·太湖之光平台上宇宙N体模拟中FMM的并行优化[J].山东科技大学学报（自然科学版）,2024,43(3):105-113.

二级引证文献1

1张骁,周清雷,李斌.基于HRCA的可重构SM4密码算法研究与实现[J].网络与信息安全学报,2020,6(5):101-109. 被引量：2

1陈彬,陈和平,李晓卉.基于GPU的高效图像协方差矩阵算法与实现[J].计算机工程与设计,2014,35(12):4238-4242. 被引量：2
2曹旻,李海强,曹真.基于混合架构的FMM算法硬件加速[J].计算机工程,2012,38(16):275-278.
3王岳青,窦勇,吕启,李宝峰,李腾.DLPF:基于异构体系结构的并行深度学习编程框架[J].计算机研究与发展,2016,53(6):1202-1210. 被引量：3
4朱永华,朱聪,郑衍衡.基于FPGA的高性能计算中全局流水的研究[J].计算机工程与设计,2011,32(10):3382-3385. 被引量：2
5董蕾,黄方,卜栓栓,冯杰,周纪.基于CUDA的压缩感知重构算法并行化研究[J].信息技术,2016,40(4):32-36. 被引量：1
6冷星峰.揭开整合显卡的显存大小之谜[J].电脑爱好者,2001(14):66-66.
7近期DIY配机中常见的误区[J].计算机与网络,2007,33(14):13-13.
8李琪刚,柴亚辉,徐炜民,郑衍衡.多体问题FMM算法在加速部件FPGA研究与实现[J].计算机工程与设计,2011,32(10):3391-3394. 被引量：4
9肖利民,祝明发.浅谈超级计算中心的高性能计算机系统面临的挑战及应对[J].科研信息化技术与应用,2010,1(1):27-34. 被引量：2
10赵育善,谷良贤.多体机械手的一般动力学模型[J].应用数学和力学,1993,14(9):829-833.

计算机工程与设计

2011年第9期

浏览历史

内容加载中请稍等...

FMM算法中PP问题在GPU上的研究与实现被引量：2

参考文献14

同被引文献21

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

FMM算法中PP问题在GPU上的研究与实现 被引量：2

参考文献14

同被引文献21

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

FMM算法中PP问题在GPU上的研究与实现被引量：2