We introduce a new code for cosmological simulations, PHo To Ns, which incorporates features for performing massive cosmological simulations on heterogeneous high performance computer(HPC) systems and threads oriented...We introduce a new code for cosmological simulations, PHo To Ns, which incorporates features for performing massive cosmological simulations on heterogeneous high performance computer(HPC) systems and threads oriented programming. PHo To Ns adopts a hybrid scheme to compute gravitational force, with the conventional Particle-Mesh(PM) algorithm to compute the long-range force,the Tree algorithm to compute the short range force and the direct summation Particle-Particle(PP) algorithm to compute gravity from very close particles. A self-similar space filling a Peano-Hilbert curve is used to decompose the computing domain. Threads programming is advantageously used to more flexibly manage the domain communication, PM calculation and synchronization, as well as Dual Tree Traversal on the CPU+MIC platform. PHo To Ns scales well and efficiency of the PP kernel achieves68.6% of peak performance on MIC and 74.4% on CPU platforms. We also test the accuracy of the code against the much used Gadget-2 in the community and found excellent agreement.展开更多
基金support from the National Key Program for Science and Technology Research and Development (2017YFB0203300)the National Natural Science Foundation of China (Grant Nos. 11403035, 11425312 and 11573030)support from Royal Society Newton advanced Fellowships
文摘We introduce a new code for cosmological simulations, PHo To Ns, which incorporates features for performing massive cosmological simulations on heterogeneous high performance computer(HPC) systems and threads oriented programming. PHo To Ns adopts a hybrid scheme to compute gravitational force, with the conventional Particle-Mesh(PM) algorithm to compute the long-range force,the Tree algorithm to compute the short range force and the direct summation Particle-Particle(PP) algorithm to compute gravity from very close particles. A self-similar space filling a Peano-Hilbert curve is used to decompose the computing domain. Threads programming is advantageously used to more flexibly manage the domain communication, PM calculation and synchronization, as well as Dual Tree Traversal on the CPU+MIC platform. PHo To Ns scales well and efficiency of the PP kernel achieves68.6% of peak performance on MIC and 74.4% on CPU platforms. We also test the accuracy of the code against the much used Gadget-2 in the community and found excellent agreement.