期刊文献+

基于SIMD的三角函数高性能实现与优化 被引量:3

High Performance Implementation and Optimization of Trigonometric Functions Based on SIMD
下载PDF
导出
摘要 作为基本的数学运算,三角函数的高性能实现对构建处理器的基础软件生态具有重要意义,特别是当前处理器都采用了SIMD架构,基于SIMD实现高性能三角函数具有重要的研究意义和应用价值。对此,文中采用数值分析的方法,对5个常用的三角函数sin,cos,tan,atan,atan2进行了高性能的实现与优化。首先通过分析浮点数IEEE754标准,设计了高效的三角函数算法;然后通过多项式逼近算法中的泰勒公式、帕德近似及雷米兹算法提升了算法精度;最后利用指令流水线与SIMD优化进一步提升了算法性能。实验结果表明,在满足精度的前提下,所实现的三角函数,相较于libm算法库和ARM;算法库,在ARM V8计算平台上都获得了较大的性能提升,其中相比libm算法库有1.77~6.26倍的时间性能提升,相比ARM;算法库有1.34~1.5倍的时间性能提升。 As a basic mathematical operation,the high-performance implementation of trigonometric functions is of great significance to the construction of the basic software ecology of the processor.Especially,the current processors have adopted the SIMD architecture,and the implementation of high-performance trigonometric functions based on SIMD has important research significance and application value.In this regard,this paper uses numerical analysis method to implement and optimize the five commonly used trigonometric functions sin,cos,tan,atan,atan2 with high performance.Based on the analysis of floating-point IEEE754 standard,an efficient trigonometric function algorithm is designed.Then,the algorithm accuracy is further improved by the application of Taylor formula,Pade approximation and Remez algorithm in polynomial approximation algorithm.Finally,the perfor-mance of the algorithm is further improved by using instruction pipeline and SIMD optimization.The experimental results show that,on the premise of satisfying the accuracy,the trigonometric function implemented is compared with libm algorithm library and ARM;algorithm library,on the ARM V8 computing platform,has achieved great performance improvement,whose time performance is 1.77~6.26 times higher than libm algorithm library,and compared with ARM;,its times performance is 1.34~1.5 times higher.
作者 姚建宇 张祎维 张广婷 贾海鹏 YAO Jian-yu;ZHANG Yi-wei;ZHANG Guang-ting;JIA Hai-peng(State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 100049,China;Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China)
出处 《计算机科学》 CSCD 北大核心 2021年第12期29-35,共7页 Computer Science
基金 国家重点研发计划(2017YFB0202502,2018YFC0809306,2017YFB0202105,2016YFB0200803,2017YFB0202302) 国家自然科学基金(61972376) 北京自然科学基金(L182053)。
关键词 三角函数 SIMD 高性能 数值分析 ARM V8架构 Trigonometric function SIMD High performance Numerical analysis ARM V8 architecture
  • 相关文献

参考文献2

二级参考文献6

共引文献8

同被引文献14

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部