摘要
点积函数是BLAS库中的一级基础函数,其被科学计算等领域广泛调用.由于浮点计算会引入舍入误差,现有BLAS库中双精度点积函数不足以满足某些应用领域的精度要求,因此需要高精度算法来实现更精确可靠的计算.在本文中,面向国产申威1621平台,在现有的BLAS库的基础上,新增高精度点积函数的实现接口,来满足应用的高精度需求.同时,对于高精度点积算法运用循环展开、访存优化、指令重排等优化策略,实现汇编级手工优化.实验结果显示,文中高精度点积算法的计算结果精度,近似达到了双精度点积的两倍,有效提升了原始算法精度.同时,在保证精度提升的基础上,文中优化后的高精度点积函数相比未优化前,平均性能加速比达到了1.61.
The dot product function is a first-level basic function in the BLAS library,which is widely called by scientific calculations and other fields.As the floating-point calculation introduces rounding errors,the double-precision dot product is unable to meet the accuracy requirements in some application fields,and thus high-precision algorithms are needed to achieve more accurate and reliable calculations.In this study,on the basis of the existing BLAS library,the interface of the high-precision dot product function is added to meet the high-precision requirements of applications on the domestic SW1621 platform.At the same time,the high-precision dot product algorithm uses such optimization strategies as loop expansion,visit-memory optimization,and instruction rearrangement to realize assembly-level manual optimization.The experimental results indicate that the high-precision dot product algorithm has the accuracy approximately twice that of the double-precision dot product,which effectively improves the precision of the original algorithm.On this basis,the average performance speedup of the high-precision dot product function reaches 1.61 after optimization.
作者
徐方洁
王磊
王一卓
张亚光
XU Fang-Jie;WANG Lei;WANG Yi-Zhuo;ZHANG Ya-Guang(Research Institute of Frontier Information Technology,Zhongyuan University of Technology,Zhengzhou 450007,China)
出处
《计算机系统应用》
2023年第2期400-405,共6页
Computer Systems & Applications
关键词
申威1621
点积
高精度
BLAS库接口
性能优化
SW1621
dot product
high-precision
BLAS library interface
performance optimization