摘要
We present a GPU-accelerated cosmological simulation code,PhotoNs-GPU,based on an algorithm of Particle Mesh Fast Multipole Method(PM-FMM),and focus on the GPU utilization and optimization.A proper interpolated method for truncated gravity is introduced to speed up the special functions in kernels.We verify the GPU code in mixed precision and different levels of the interpolated method on GPU.A run with single precision is roughly two times faster than double precision for current practical cosmological simulations.But it could induce an unbiased small noise in power spectrum.Compared with the CPU version of PhotoNs and Gadget-2,the efficiency of the new code is significantly improved.Activated all the optimizations on the memory access,kernel functions and concurrency management,the peak performance of our test runs achieves 48%of the theoretical speed and the average performance approaches to~35%on GPU.
基金
the National SKA Program of China(Grant No.2020SKA0110401)
the National Natural Science Foundation of China(Grant No.12033008)
K.C.Wong Education Foundation。