期刊文献+

面向Stencil计算的自动混合精度优化

Automatic Mixed Precision Optimization for Stencil Computation
下载PDF
导出
摘要 混合精度在深度学习和精度调整与优化方面取得了许多进展,广泛研究表明,面向Stencil计算的混合精度优化也是一个很有挑战性的方向.同时,多面体模型在自动并行化领域取得的一系列研究成果表明,该模型为循环嵌套提供很好的数学抽象,可以在其基础上进行一系列的循环变换.基于多面体编译技术设计并实现了一个面向Stencil计算的自动混合精度优化器,通过在中间表示层进行迭代空间划分、数据流分析和调度树转换,首次实现了源到源的面向Stencil计算的混合精度优化代码自动生成.实验表明,经过自动混合精度优化之后的代码,在减少精度冗余的基础上能够充分发挥其并行潜力,提升程序性能.以高精度计算为基准,在x86平台上最大加速比是1.76,几何平均加速比是1.15;在新一代国产申威平台上最大加速比是1.64,几何平均加速比是1.20. Mixed precision has made many advances in deep learning and precision tuning and optimization.Extensive research shows that mixed precision optimization for stencil computation is challenging.Moreover,the research achievements secured by the polyhedral model in the field of automatic parallelization indicate that the model provides a good mathematical abstraction for loop nesting,on the basis of which loop transformations can be performed.This study designs and implements an automatic mixed precision optimizer for Stencil computation on the basis of polyhedral compilation technology.By performing iterative domain partitioning,data flow analysis,and scheduling tree transformation on the intermediate representation layers,this study implements the source-to-source automatic generation of mixed precision codes for Stencil computation for the first time.The experiments demonstrate that the code after automatic mixed precision optimization can give full play to its parallelism potential and improve the performance of the program by reducing precision redundancy.With high-precision computing as the benchmark,the maximum speedup is 1.76,and the geometric average speedup is 1.15 on the x86 architecture;on the new-generation Sunway architecture,the maximum speedup is 1.64,and the geometric average speedup is 1.20.
作者 宋广辉 郭绍忠 赵捷 陶小涵 李飞 许瑾晨 SONG Guang-Hui;GUO Shao-Zhong;ZHAO Jie;TAO Xiao-Han;LI Fei;XU Jin-Chen(Information Engineering University,Zhengzhou 450001,China;State Key Laboratory of Mathematical Engineering and Advanced Computing(Information Engineering University),Zhengzhou 450001,China)
出处 《软件学报》 EI CSCD 北大核心 2023年第12期5704-5723,共20页 Journal of Software
基金 国家自然科学基金(U20A20226)。
关键词 自动混合精度 Stencil计算 多面体模型 循环嵌套 调度树 automatic mixed precision Stencil computation polyhedral model loop nesting scheduling tree
  • 相关文献

参考文献2

二级参考文献50

  • 1Xu JC, Guo SZ, Wang L. Optimization technology in SIMD mathematical functions based on vector register reuse. In: Proc. of the 2012 IEEE 14th Int'l Conf. on High Performance Computing and Communications (HPCC 2012). Liverpoor: IEEE Computer Society, 2012. ! 102-1107. Idol: 10.1109/HPCC.2012.161].
  • 2Daramy C, Defour D, de Dinechin F, Muller JM, Arenaire P. CR-LIBM: A correctly rounded elementary function library. In: Proc. of the Optical Science and Technology, SPIE's 48th Annual Meeting. Int'l Society for Optics and Photonics. 2003. 458-464. [doi: 10.1117/12. 505591].
  • 3Wu XY, Xia JL. New vector forms of elemental functions with Taylor series. Applied Mathematics and Computation, 2003,141(2): 307-312. [doi: 10.1016/S0096-3003(02)00255-2].
  • 4Tang PTP. A Portable Generic Elementary Function Package in Ada and an Accurate Test Suite. Department of Defense, 1990. [doi: 10.1145/123533.123573].
  • 5Manos P, Turner LR. Constrained Chebyshev Approximations to Some Elementary Functions Suitable for Evaluation with Floating Point Arithmetic. NASA, 1972.
  • 6Abramowitz M, Stegun IA. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Courier Dover Publications, 1964.
  • 7Andraka R. A survey of CORDIC algorithms for FPGA based computers. In: Proc. of the '98 ACM/SIGDA 6th Int'l Symp. on Field Programmable Gate Arrays. ACM Press, 1998. 191-200. [doi: 10.1145/275107.275139].
  • 8Muller JM. Elementary Functions: Algorithms and Implementation. Springer-Verlag, 2006.
  • 9Baboulin M, Buttari A, Dongarra J, Kurzak J, Langou J, Langou J, Luszczek P, Tomov S. Accelerating scientific computations with mixed precision algorithms. Computer Physics Communications, 2009,180(12):2526-2533. [doi: 10.1016/j.cpc.2008.11.005].
  • 10Bailey DH, Barrio R, Borwein JM. High-Precision computation: Mathematical physics and dynamics. Applied Mathematics and Computation, 2012.

共引文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部