期刊文献+

格子玻尔兹曼方法计算程序的循环优化技术研究

Research on Loop Optimization for Lattice Boltzmann Method Computation Program
下载PDF
导出
摘要 格子玻尔兹曼方法(Lattice Boltzmann Method,LBM)在计算流体力学领域中得到广泛应用,但传统的LBM计算程序耗时巨大,如何优化LBM的计算程序具有重要研究意义.现有的优化方法较少关注LBM计算程序中时间步迭代中潜在的大量数据重用收益,造成计算性能损失.本文通过对LBM计算程序核心循环代码进行循环优化,将其巨大的迭代空间划分成满足cache容量的分块,从而提高数据重用性,同时开发粗粒度循环并行性.分块大小在对迭代空间划分时起到了影响性能的关键作用.本文根据LBM的程序特征提出了一种混合的分块大小选择方法——LBM_TSS方法.该方法从LBM计算程序的访存行为、局部性收益、并行效率以及同步开销四个方面进行静态分析,在约束条件限定的搜索空间内进一步对分块大小寻优,从而计算出性能最优的分块大小.本文在一个共享内存多核系统上对LBM_TSS方法的有效性进行了全面的验证和分析.实验结果表明,在最优情况下,采用LBM_TSS方法计算的分块大小所实现的LBM循环优化方法,与其他3种LBM并行优化方法相比,将LBM程序性能提高了16.79%. The Lattice Boltzmann Method(LBM)is widely used in the field of computational fluid dynamics.However,the traditional LBM computation is time-consuming.Therefore,it has great significance to optimize LBM computation program.The existing algorithms pay little attention to the potential large amount of data reuse benefits in time step iterations of the LBM computation program,resulting in the loss of performance.This paper performs loop optimizations on the core loops of LBM computation program to divide the huge iteration space into blocks or tiles that match the cache capacity,thereby improving data reuse and developing coarse-grained loop parallelism.The tile size has an important impact on program performance when partitioning the iteration space.A hybrid tile size selection method,LBM_TSS,is proposed,based on the program characteristics.LBM_TSS performs static analysis from the four aspects of LBM computation program’s memory access behavior,locality benefits,parallel efficiency and synchronization cost to construct constraints of tile sizes.Then,LBM_TSS performs empirical search to tune the best tile size in the constrained search space.The effectiveness of LBM_TSS is fully verified and analyzed on a shared memory multi-core system.Experimental results show that our loop optimization method with the tile size calculated by LBM_TSS improves the performance of LBM program by 16.79%at best,compared with other three LBM parallel optimization methods.
作者 崔元桢 刘松 王倩 伍卫国 CUI Yuan-Zhen;LIU Song;WANG Qian;WU Wei-Guo(School of Computer Science and Technology,Xi’an Jiaotong University,Xi’an 710049)
出处 《计算机学报》 EI CSCD 北大核心 2020年第6期1086-1102,共17页 Chinese Journal of Computers
基金 国家重点研发计划(2017YFB0203003) 国家自然科学基金(91630206,61672423)资助.
关键词 格子玻尔兹曼方法 局部性优化 并行优化 分块大小选择 LBM locality optimization parallel optimization tile size selection
  • 相关文献

参考文献2

二级参考文献30

  • 1Benoit A,Melhem R,Renaud-Goud P,Robert Y.Power-Aware manhattan routing on chip multiprocessors.In:Proc.of the 26th Int’l Parallel and Distributed Processing Symp.(IPDPS 2012).Washington:IEEE Computer Society,2012 189-200.
  • 2Jin H,Jespersen D,Mehrotra P,Biswas R,Huang L,Chapman B.High performance computing using MPI and OpenMP on multi-core parallel systems.Parallel Computing,2011,37(9):562-575.
  • 3Cytron R.Doacross:Beyond vectorization for multiprocessors.In:Proc.of the Int’l Conf.on Parallel Processing.1986.836-844.
  • 4Chen DK,Yew PC.An empirical study on DOACROSS loops.In:Proc.of the Supercomputing.New York:ACM Press,1991.620-632.
  • 5Hurson AR,Lim JT,Kavi KM,Lee B.Parallelization of DOALL and DOACROSS loops-A survey.Advances in Computers,1997,45:53-103.
  • 6Ma L.Tuning pipeline granularity based on feedback directed framework [MS.Thesis].Beijing:Institute of Computing Technology of Chinese Academy of Sciences,2005.
  • 7Lin YT,Wang SC,Shih WL,Hsieh BKY.Enable OpenCL compiler with Open64 infrastructures.In:Proc.of the 13th IEEE Int’l Conf.on High Performance Computing and Communications(HPCC 2011).Washington:IEEE Computer Society,2011.863-868.
  • 8Midkiff SP,Padua DA.Compiler algorithms for synchronization.IEEE Trans.on Computers,1987,C-36(12):1485-1495.
  • 9Wolfe M.Multiprocessor synchronization for concurrent loops.IEEE Software,1988,5(1):34-42.
  • 10Su HM,Yew PC.On data synchronization for multiprocessors.In:Proc.of the 16th Annual Int’l Symp.on Computer Architecture.New York:ACM Press,1989.416-423.

共引文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部