期刊文献+

面向GPU架构的CCFD-KSSolver组件设计和实现

Implementation of CCFD-KSSolver Component for GPU Architecture
下载PDF
导出
摘要 【应用背景】在如计算流体力学和材料科学等高性能应用领域中,大型稀疏线性方程的求解直接影响高性能应用的效率与精度。异构众核已成为现代超算系统体系结构的重要特征和发展趋势。【方法】本文面向CPU+GPU异构超算系统设计并实现了线性解法器组件CCFD-KSSolver。该组件针对异构体系结构特征,实现了针对多物理场块结构矩阵的Krylov子空间解法器和多种典型预处理方法,采用了如计算通信重叠、GPU访存优化、CPUGPU协同计算等优化技术提升CCFD-KSSolver的计算效率。【结果】顶盖驱动流的实验表明,当子区域数目为8时,Block-ISAI相比于CPU和cuSPARSE的子区域求解器分别取得20.09倍和3.34倍的加速比,且具有更好的扩展性;对于百万阶规模的矩阵,应用3种子区域求解器的KSSolver在8个GPU上的并行效率分别为83.8%、55.7%、87.4%。【结论】本文选择具有块结构的经典多物理中的应用对解法器及预处理软构件进行测试,证明其稳定高效性,有力支撑了以流体力学数值模拟为代表的高性能计算与应用在异构系统上的开展。 [Application Background]In high-performance applications such as computational fluid dynamics and material science,the efficiency and accuracy will be directly affected by the solution of large sparse linear equations.Heterogeneous many-core has become an important feature of modern supercomputing architecture and will be the future trend.[Methods]The linear solver component CCFD-KSSolver is designed and implemented for a CPU+GPU heterogeneous supercomputing system.The component implements the Krylov subspace solver for the multi-physical field block structure matrix and a variety of typical preconditioners.Optimization techniques such as computation-communication overlap,GPU memory access optimization,and CPU-GPU collaborative computing are used to improve the computational efficiency of the CCFD-KSSolver.[Results]Experimental results show that when the number of subdomains is 8,Block-ISAI achieves a speedup of 20.09×and 3.34×compared with CPU and cuSPARSE subdomain solvers,respectively,and has better scalability.For million-level matrices,the parallel efficiency of the three subdomain solvers of KSSolver on eight GPUs is 83.8%,55.7%,and 87.4%,respectively.[Conclusions]The application of classical multi-physics with block structure is selected to test the solver and preconditioning components.The results show that the solver is stable and efficient,which strongly supports the development of highperformance computing and applications on heterogeneous systems.
作者 张浩源 马文鹏 袁武 张鉴 陆忠华 ZHANG Haoyuan;MA Wenpeng;YUAN Wu;ZHANG Jian;LU Zhonghua(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100083,China;University of Chinese Academy of Sciences,Beijing 100049,China;Xinyang Normal University,Xinyang,Henan 464000,China)
出处 《数据与计算发展前沿》 CSCD 2024年第1期68-78,共11页 Frontiers of Data & Computing
基金 国家重点研发计划资助(2020YFB1709500) 河南省重点研发与推广专项(222102210162)。
关键词 GPU KSSolver 并行优化 预条件 高性能计算 GPU KSSolver parallel optimization preconditioner high-performance computing
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部