期刊文献+

基于GPU的LBM迁移模块算法优化

GPU-based Algorithm Optimization for Streaming Module of Lattice Boltzmann Method
下载PDF
导出
摘要 格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化。对并行算法进行测试,结果显示:该算法在1.3×10^(8)规模网格下能达到1.92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率。 The Lattice Boltzmann Method(LBM)is a Computational Fluid Dynamics(CFD)method based on a mesoscopic simulation scale.A large number of discrete lattice points suitable for parallelism are set during the calculation.Several arithmetic logic units in a Graphics Processing Unit(GPU)are suitable for large-scale parallel computing.The design of a GPU-based LBM parallel algorithm can improve the computational efficiency of the algorithm.However,the calculation of each lattice point in the streaming module of the LBM algorithm requires communication with other lattice points that have strong data dependence.In this study,a GPU-based optimization strategy for an LBM streaming module is proposed.First,the implementation logic of the migration part is analyzed in detail,and a three-dimensional model is discretized into several two-dimensional models according to the velocity component through model dimension reduction,which reduces the complexity of the model.Second,the data differences in the lattice points before and after the streaming module calculation are analyzed,the communication rules of the streaming module are determined through data positioning,and the data exchange modes between the lattice points are classified.The discrete two-dimensional model is thereafter divided into regions using a classified exchange mode,and a new data communication mode is designed.Finally,the influence of data dependence is successfully eliminated and the streaming module is completely parallel.The parallel algorithm is tested,and an acceleration ratio of 1.92 times is achieved under 1.3×10^(8) grids,which shows that the algorithm has a good parallel effect.Meanwhile,compared with an algorithm that does not parallelize the streaming module,the optimization strategy in this study can improve the parallel computing efficiency of the algorithm by 30%.
作者 黄斌 柳安军 潘景山 田敏 张煜 朱光慧 HUANG Bin;LIU Anjun;PAN Jingshan;TIAN Min;ZHANG Yu;ZHU Guanghui(Shandong Computer Science Center(National Supercomputer Center in Jinan),Qilu University of Technology(Shandong Academy of Sciences),Jinan 251013,Shandong,China;High Performance Computing Laboratory,Jinan Institute of Supercomputer Technology,Jinan 251013,Shandong,China;School of Energy Science and Engineering,Harbin Institute of Technology,Harbin 150001,Heilongjiang,China)
出处 《计算机工程》 CAS CSCD 北大核心 2024年第2期232-238,共7页 Computer Engineering
基金 国家自然科学基金(62002186) 山东省重点研发计划项目(2021RZB01002)。
关键词 高性能计算 格子玻尔兹曼方法 图形处理器 并行优化 数据重排 High Performance Computing(HPC) Lattice Boltzmann Method(LBM) Graphics Processing Unit(GPU) parallel optimization data rearrangement
  • 相关文献

参考文献3

二级参考文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部