期刊文献+

大规模申威众核环境下二维数据计算的可扩展方法 被引量:1

Large Scalability Method of 2D Computation on Shenwei Many-core
下载PDF
导出
摘要 随着超级计算机及其编程环境的发展,异构系统结构下的多级并行编程将成为趋势,神威·太湖之光国产超级计算机就是其中的一个典型。自2016年神威·太湖之光运行以来,国内外很多学者在其上进行了方法研究和应用验证,为申威环境积累了比较丰富的众核化编程方法及优化方法。但是,将全球系统模式CESM移植到申威众核环境时,对于海洋分量模式POP中的一些二维数据计算,常用的众核优化方法在1024进程规模下运行时具有较好的加速效果,然而在16800大规模进程下运行时众核化会失效,表现为负加速。针对上述问题,文中提出了一种基于从核分区的并行计算方法,一个核组内的64个从核被分成多个互不交叉的从核分区,将可以独立计算的多个代码段计算任务分别分配到不同的从核分区上进行运行,能够有效利用从核的计算能力,还可以实现对多个独立的代码段进行计算时间隐藏。每个从核分区内的从核数量及从核号可以根据拟分配的计算任务情况进行适当选取,使得每个从核都能达到较适宜的数据量和计算量。在采用前述从核分区方法的基础上,结合使用循环合并和函数上提等方法增大程序并行粒度,提高了二维数据计算在大规模进程下的可扩展性,CESM模式高分辨率G算例中POP分量模式在110万核心规模下的模拟速度提高了0.8模式年/天,众核化的加速效果明显。 With the development of supercomputer and its programming environment,multilevel parallelism under heterogeneous system infrastructure is a promising trend.Applications ported to Sunway TaihuLight are typical.Since the Sunway TaihuLight was open to public in 2016,many scholars focus on the method study and application verification,so much experience on Shenwei many-core programming method is accumulated.However,when the CESM model is ported to Shenwei many-core infrastructure,some two dimensional computations in the ported POP model show quite good results under 1024 processes.On the contrary,they perform much worse than the original version,and false acceleration ratios appeared under 16800 processes.Upon this problem,a new parallel method based on slave-core partitions was proposed.Under the new parallel method,the 64 slave-cores in a core-group are divided into some disjoint small partitions,which make that different and independent computing kernels can run at different slave-core partitions simultaneously.In the method,the computing kernels can be loaded to different slave-core partitions with the suitable data size and computational load,where the amount and number of the slave-cores in each partition can be pro-perly set according to the computing scale,so the slave-core’s calculation ability can be fully utilized.Based on the new parallel method,also with the loops combination and function expansion,the slave-cores are fully applied and some computing time among several parallel running codes is hidden.Furthermore,it is effective to extend the parallel granularity of the kernels to be athrea-ded.Applied the above methods,the simulation speed of POP model in high-resolution CESM G-compset is improved by 0.8 si-mulation year per day under 1.1 million cores.
作者 庄园 郭强 张洁 曾云辉 ZHUANG Yuan;GUO Qiang;ZHANG Jie;ZENG Yun-hui(Qilu University of Technology(Shandong Academy of Sciences),Jinan 250101,China;Shandong Computer Science Center(National Supercomputer Center in Jinan),Jinan 250101,China;Shandong Provincial Key Laboratory of Computer Networks,Jinan 250101,China)
出处 《计算机科学》 CSCD 北大核心 2020年第8期87-92,共6页 Computer Science
基金 国家重点研发计划项目(2016YFB0201100)。
关键词 二维数据计算 申威众核 大规模可扩展性 从核分区 并行粒度 2D-array computation Shenwei many-core Large scalability Slave-core partition Parallel granularity
  • 相关文献

参考文献7

二级参考文献51

  • 1周天军,俞永强,宇如聪,刘海龙,李薇,张学洪.气候系统模式发展中的耦合器研制问题[J].大气科学,2004,28(6):993-1008. 被引量:29
  • 2伍湘君,金之雁,陈德辉,宋君强,杨学胜.新一代数值预报模式GRAPES的并行计算方案设计与实现[J].计算机研究与发展,2007,44(3):510-515. 被引量:17
  • 3陈显尧 宋振亚 王永刚.并行计算在海洋环流数值模式中的应用.高性能计算发展与应用,2005,(4).
  • 4Dennis J M.Expedition computing:exploring the petaseale frontier [EB/OL]. ( 2007 ).http ://www.cisl.ncar.edu/dir/CAS2K7/final_agenda- 2007.html.
  • 5Smith R D,Gent P.Reference manual for the Parallel Ocean Program(POP),Los Alamos Unclassified Report LA-UR-02-2484[R]. 2002.
  • 6Kerbyson D J,Jones P W.A performance model of the parallel ocean program[J].The International Journal of High Performance Computing Applications,2005,19(3) :261-276.
  • 7Kim Dong-Hoon,Nakashiki N,Yoshida Y.Computation of super high-resolution global ocean model using earth simulator[C]//Proceedings of Coastal and Ocean Engineering in Korea,2003.
  • 8Zaki T, Moulton D, Nadig B, et al.Muhigrid preconditioning in fully-implicit evolution of the ocean[R/OL].T-7,MS B284,Theoretical Division,Los Alamos National Laboratory,Los Alamos,NM 87545. http://math.lanl.gov/.
  • 9Jones P W,Worley P H,Yoshida Y,et al.Practical performance portability in the Parallel Ocean Program(POP)[J].Concurrency and Computation : Practice and Experience, 2005,17 : 1317-1327.
  • 10Smith R D,Jones P W. The Parallel Ocean Program (POP) Reference Manual:Ocean Component of the Community Cli- mate System Model (CCSM) [R]. LAUR-10-01853, Los Ala mos National Laboratory,2010.

共引文献58

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部