摘要
大气动力学问题的数值模拟在气象预报等领域具有广泛的应用.相关数值模拟依赖超级计算机平台实现高精度高分辨率的气象预报,隐式求解不受稳定性条件限制,相比显式求解更有优势.面向新的超级计算机架构特征研究隐式大气动力学问题中一系列算子操作的并行和优化方法是非常有必要的.本文在规则递推关系的理论框架下对大气动力学问题预条件阶段的稀疏三角回代求解以及ILU矩阵分解操作的特征进行了总结,并结合申威26010Pro处理器的架构特点,对现有结构化稀疏三角线性方程组问题的并行算法进行了推广,设计了一套面向单向规则递推关系的算法框架,解决了预条件阶段各类算子的并行加速问题.本文还面向申威26010Pro处理器对大气动力学问题的模板计算等算子进行了移植和优化.实验结果显示,本文的算法框架对预条件阶段的算子能够实现26-33倍不等的加速效果,对模板计算等算子的优化相比串行计算有10-152倍的加速比.在新的神威超级计算机上最大测试到1700多万核心,浮点性能达到20.5PFlop/s.在大规模测试条件下的强(弱)可扩展性维持在56.81%(41.87%)以上.
Numerical simulation of atmospheric model has been widely applied to weather prediction.Supercomputers play an important role in improving the accuracy and resolution of weather forecasting.Implicit solver of atmospheric model counts on many computation kernels.Their parallelization and optimization are critical to solver performance.Based on uniform recurrence relation(URR),this paper introduces a general framework to model the behavior of sparse triangular solve(SpTRSV)and incomplete matrix factorization(ILU).For SpTRSV and ILU,the extended framework leads to an speed up of 26×and 33×.This paper also parallelizes kernel functions in numerical simulation of atmospheric model,with acceleration ratio of 10-152×over serial implementations.The simulation achieves a sustained aggregated performance of 20.5PFlop/s in double precision,with strong-and weak-scaling efficiency above 56.81%and 41.87%,respectively.
作者
陈道琨
刘芳芳
杨超
Chen Daokun;Liu Fangfang;Yang Chao(University of Chinese Academy of Sciences,Beijing 100049,China;Laboratory of Parallel Software and Computational Science,ISCAS,Beijing 100190,China;State Key Laboratory of Computer Science,ISCAS,Beijing 100190,China;School of Mathematical Sciences,Peking University,Beijing 100871,China)
出处
《数值计算与计算机应用》
2023年第2期198-213,共16页
Journal on Numerical Methods and Computer Applications
基金
国家重点研发计划高性能计算重点专项(2020YFB0204601)资助。