摘要
基于CUDA和MPI实现了格子Boltzmann方法的多GPU并行计算,利用程序对Tesla K80和Tesla P100的性能进行了测试,结果表明:Tesla P100有着远超Tesla K80的计算性能。单GPU下,P100在计算规模为2563达到最大值2880.0 MLUPS,K80在规模为384~3达到最大值801.6 MLUPS;在多GPU并行时,GPU间通信会带来计算性能的损失,但是P100较K80仍具有较大的提升;测定函数LBCollProp在不同规模下运行时间以及其在程序总运行时间中的占比,由此可以预估程序运行一定时间步的耗时。
Multi-GPU parallel computation of lattice Boltzmann method is implemented by using CUDA and MPI. The computational performances of the LBM program run on Tesla K80 and Tesla P100 were tested. Numerical results show that the computational speed of the LBM program on Tesla P100 is far faster than that ofTesla K80. For single GPU, LBM program run on P100 reaches its maximum 2880.0 MLUPS with 2563 size scale while that of K80 obtains the maximum 801.6 MLUPS with 3843 size scale; For multi-GPUs parallel computation, although the communication between GPUs bring the loss of computational performance, the P100 still has a larger considerable room for improvement than that of K80; the exact execution time of the function LBCollProp and its percentage in total time consumed in the program is obtained at different size scale, thus the consuming time of the total program can be well predicted.
出处
《计算机与应用化学》
CAS
2017年第10期739-748,共10页
Computers and Applied Chemistry
基金
国家自然科学基金资助项目(91434113
51776212)
中国科学院前沿科学研究重点计划(QYZDB-SSW-SYS029)
国家重点基础研究发展计划(973)资助项目(2015CB251402)