摘要
大气环流模式是中科院地球系统模式中最为复杂的模式,在当前主流的众核异构平台上开展大气环流模式的众核并行化是高性能计算的热点研究问题。针对AGCM4.0热点程序动力框架的适应过程Tend_lin,利用神威OpenACC编程模型在"神威·太湖之光"高性能计算平台上实现并行化,并从循环分布、循环分块、数据传输的表达、函数调用的从核化等方面提升应用性能。详细讨论了不同场景下的数据传输表达,对比测试了不同分块尺寸对程序性能的影响。相比主核串行,两种测试规模下,Tend_lin应用的单核组多线程并行均获得6倍以上的加速;且随着应用分辨率的扩大,众核处理器的性能得到更好发挥,在C规模下,多进程获得了69倍的全应用加速。
Atmospheric general circulation model (AGCM) is the most complex model of the Chinese Academy of Sciences' Earth System Model (CAS-ESM) and the many-core parallelization of AGCM on the leading many-core heterogeneous high performance computing (HPC) platform is one of the hotspots in HPC area. In this paper, Tend_lin, the adaptive process of AGCM 4.0 hotspot program, was parallelized on Sunway platform by using OpenACC programming model. Its performance was improved from the aspects of loop distribution, loop tiling, expression of data transfer, and function call. The data transmission expressions under different scenarios were discussed in detail and the effects of different block sizes on program performance were tested. Compared with the master-core serial application, the many-core parallel application of Tend_lin was accelerated more than 6 times in the single core group. With the increase of application resolution, the performance of the many-core processor got better performance. In the C scale, the acceleration ratios of the multi-process application was up to 69.
作者
傅游
王坦
郭强
高希然
FU You;WANG Tan;GUO Qiang;GAO Xiran(College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, China;National Supercomputer Center in Jinan, Jinan, Shandong 250101, China;State Key Laboratory of ComputerArchitecture, Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China)
出处
《山东科技大学学报(自然科学版)》
CAS
北大核心
2019年第2期90-99,共10页
Journal of Shandong University of Science and Technology(Natural Science)