摘要
湍流燃烧问题的数值模拟是航空发动机设计的关键工具。由于需要使用高精度计算模型求解NS方程,湍流燃烧的数值模拟需要庞大的计算量,而物理化学模型的引入则导致流场极为复杂,使得计算域内的负载平衡问题成为大规模并行计算的瓶颈。为此文中将湍流燃烧的数值模拟方法在单台具有强大计算能力的服务器——DGX-2上进行移植和优化,设计了通量计算的线程分配方式,并以Roofline模型为工具分析指导了实际的优化方向。此外,还设计了高效的数据通信方式,并结合DGX-2的高速互联实现了湍流燃烧数值模拟方法的多GPU并行版本。实验结果表明,相较于双路Intel Xeon 6248 CPU 40核心的并行版本,迭代过程的计算部分在单块V100上获得了8.1倍的性能提升,在DGX-2共16块V100上达到了66.1倍的加速,优于CPU并行版本所能达到的最高性能。
Numerical simulation of turbulent combustion is a key tool for aeroengine design.Due to the need of high-precision model to Navier-Stokes equation, numerical simulation of turbulent combustion requires huge amount of calculations, and the phy-sicochemical models causes the flow field to be extremely complicated, making the load balancing a bottleneck for large-scale pa-rallelization.We port and optimize the numerical simulation method of turbulent combustion on a powerful computing server, DGX-2.We design the threading method of flux calculation and use Roofline model to guide the optimization.In addition, we design an efficient communication method and propose a multi-GPU parallel method for turbulent combustion based on high-speed interconnection of DGX-2.The results show that the performance of a single V100 GPU is 8.1 x higher than that on dual-socket Intel Xeon 6248 CPU node with 40 cores.And the multi-GPU version on DGX-2 with 16 V100 GPUs achieves 66.1 x speedup, which is higher than the best performance on CPU cluster.
作者
文敏华
汪申鹏
韦建文
李林颖
张斌
林新华
WEN Min-hua;WANG Shen-peng;WEI Jian-wen;LI Lin-ying;ZHANG Bin;LIN Xin-hua(Center for High Performance Computing,Shanghai Jiao Tong University,Shanghai 200240,China;School of Aeronautics and Astronautics,Shanghai Jiao Tong University,Shanghai 200240,China)
出处
《计算机科学》
CSCD
北大核心
2021年第12期43-48,共6页
Computer Science
基金
国家重点研发计划(2016YFB0201800)。