期刊文献+

激光等离子体相互作用模拟的并行和加速研究

Parallelization and Optimization of Laser-Plasma-Interaction Simulation
下载PDF
导出
摘要 随着生成超短激光脉冲技术的不断发展,对这种激光脉冲和等离子体相互作用进行动力学描述也变得越来越重要。PIC(particle-in-cell)是一种在等离子体物理中,研究充能粒子在电磁场中运动轨迹的广泛采用的方法。尽管现在已经有一些在GPU上的PIC方法的实现,但是基于激光等离子体相互作用模拟的特点,仍然有很多重要问题可以尝试其他解决思路。提出了一种把初始的基于CPU的LPI模拟代码完整移植到GPU上的可行方法。提出了一系列加速初始的GPU版本的方法:动态冗余算法、混合精度算法、粒子排序算法。利用并且评估了GPUDirect RDMA(remote direct memory access)技术,其可以提高MPI的通信性能。实验结果证明,与初始的GPU版本相比,"Scatter"阶段加速比为6.1倍,当MPI传输数据大于3 KB时,通信过程提速了2.8倍。这些研究证明了针对模拟应用和GPU集群的特点进行特殊的优化能对性能带来显著的提升。 The progress in generating intense ultra-short laser pulse demands more and more for kinetic descriptions of the interaction of such laser pulse with plasmas.Particle-in-cell(PIC)algorithm is a widely-used method in plasma physics to study the trajectories of charged particles under electromagnetic fields.Though there have been some implementations of PIC algorithm on GPU,some important issues still need to be clarified in detail,based on the characteristic of the laser-plasma-interaction simulation.This paper introduces a way to change the original CPU laserplasma-interaction code into a parameterized adaptive GPU implementation with the whole algorithm ported.Then,this paper develops a series of methods to speed up the particle scatter phase:dynamic duplication algorithm,mixprecision computing and a parameterized particle sorting algorithm.Furthermore,this paper utilizes the GPUDirect RDMA(remote direct memory access)technique in a Kepler cluster and evaluates how it can benefit the MPI com-munication performance.The results from the numerical experiment show that these optimizations produce a 6.1x speed-up compared with the initial GPU version using the same number of GPUs for the key“Scatter”phase.The speed-up for the MPI communication part is 2.8x when the message size is over 3 KB.All the findings demonstrate that particular optimizations based on the features of the simulation and modern GPU cluster are essential for achieving significantly improved performance.
作者 武海鹏 文敏华 SEE Simon 林新华 WU Haipeng;WEN Minhua;SEE Simon;LIN James(Center for High Performance Computing,Shanghai Jiao Tong University,Shanghai 200240,China;NVIDIA Technology Center,Singapore;Global Scientific Information and Computing Center,Tokyo Institute of Technology,Tokyo,Japan)
出处 《计算机科学与探索》 CSCD 北大核心 2018年第4期550-558,共9页 Journal of Frontiers of Computer Science and Technology
基金 国家重点研发计划(Nos.2016YFB0201400,2016YFB0201800);日本学术振兴会RONPAKU项目 NVIDIA GPU全球卓越中心项目
关键词 激光等离子体相互作用 粒子网格模拟 统一计算设备架构(CUDA) CUDA优化 GPUDirect RDMA laser-plasma-interaction simulation particle-in-cell(PIC) compute unified device architecture(CUDA) CUDA optimization GPUDirect RDMA
  • 相关文献

参考文献1

二级参考文献8

  • 1Birdsall C K, Langdon A B. Plasma physics via computer simulation[M]. London: Institute of Physics Publishing, 1991.
  • 2Matsumoto H, Sato T. Computer simulation of space plas mas[M]. Tyoko: Terra Scientific Publishing Company, 1985.
  • 3Markidis S, Lapenta G, Rizwan-voldin. Multi-scale simula tions of plasma with iPIC3D[J]. Mathematics and Comput- ers in Simulation, 2010, 80(7):1509-1519.
  • 4Decyk V K, Singh T V. Adaptable particle-in cell algorithms for graphical processing units[J]. Computer Physics Corn munications, 2011, 182(3) :641-648.
  • 5Burau H, Widera R, Honig W, et al. PIConGPU: A fully relativistic particle-in-cell code for a GPU cluster[J]. IEEE Transactions on Plasma Science, 2010, 38 (10):2831-2839.
  • 6Stantchev G, Dorland W, Gumerov N. Fast parallel particleto-grid interpolation for plasma PIC simulations on the GPU [J]. Journal of Parallel and Distributed Computing, 2008, 68(10) :1339-1349.
  • 7Meng X F, Zhu X Q, Wang P, et al. Heterogeneous pro gramming and optimization of gyrokinetic toroidal code and large-scale performance test on TH-1 A[C]//Proc of the 28th International Supereomputing Conference, 2013 : 81 -96.
  • 8http ://on-demand. gputeehconf, com/gtc-express/2012/pres- entations/inside tesla-keplek20-family, pdf.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部