期刊文献+

ROCm平台半精度矩阵乘法的实现和优化

Implementation and optimization of half-precision general matrix multiplication on ROCm platform
下载PDF
导出
摘要 为提升类GPU加速器上Transformer的性能,结合单精度矩阵乘法的优化经验,对作为Transformer计算核心的半精度矩阵乘法HGEMM的性能优化进行探索。使用汇编语言实现HGEMM核函数,通过瓶颈分析和指令流测试抓取小规模矩阵上HGEMM计算访存比小、处于带宽限制下的特点;通过提升类GPU占用率和优化带宽利用率,实现优化算法的HGEMM函数设计,取得较普通算法1.1-1.3倍加速比。实验结果表明,依据半精度数据格式特点,使用优化算法提升占用率和优化指令排布,能够提升HGEMM函数的计算访存比,实现小规模矩阵HGEMM函数的性能提升。 To improve the performance of Transformer on GPU-like accelerators,the performance optimization of HGEMM,which is the core of Transformer calculation,was explored by combining the optimization experience of single-precision matrix multiplication.The HGEMM kernel function was implemented by using assembly language,and through bottleneck analysis and instruction flow testing,the characteristics of HGEMM computation on small-scale matrices were captured such as low computation/memory access ratio and bandwidth constraints.The HGEMM function design of the optimization algorithm was realized by improving the GPU-like occupancy and optimizing the bandwidth utilization,and the speed-up ratio of the optimization algorithm was 1.1-1.3 times higher than that of the common algorithms.The results show that the performance of HGEMM on small-scale matrices and the computation/memory access ratio can be enhanced by using optimization algorithm to improve the occupancy and optimize the instruction arrangement according to the characteristics of half-precision data format.
作者 王雨薇 吉青 卜景德 高娅 赵红朋 WANG Yu-wei;JI Qing;BU Jing-de;GAO Ya;ZHAO Hong-peng(Institute of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450000,China;Joint Laboratory of Advanced Computing for Theoretical Physics,Institute of Theoretical Physics,Chinese Academy of Science,Beijing 100190,China)
出处 《计算机工程与设计》 北大核心 2024年第8期2313-2319,共7页 Computer Engineering and Design
基金 国家重点研发计划基金项目(2021YFB0300200)。
关键词 类GPU加速器 矩阵乘法 半精度 性能优化 算法实现 高性能计算 线性代数 GPU-like accelerator GEMM half-precision performance optimization algorithm implementation high perfor-mance computing linear algebra
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部