期刊文献+

一种基于SRT-8算法的SIMD浮点除法器的设计与实现

Design and implementation of a SIMD floating-point divider based on SRT-8
下载PDF
导出
摘要 在科学计算、数字信号处理、通信和图像处理等应用中,除法运算是常用的基本操作之一。基于SRT-8除法算法,设计一个SIMD结构的IEEE-754标准浮点除法器,在同一硬件平台上能够实现双精度浮点除法和两个并行的单精度浮点除法。通过优化SRT-8迭代除法结构,提出商选择和余数加法的并行处理,并采用商数字存储技术降低迭代除法的计算延时,提高频率。同时,采用复用策略减少硬件资源开销,节省面积。实验表明,在40nm工艺下,本设计综合cell面积为18 601.968 1μm2,运行频率可达2.5GHz,相对传统的SRT-8实现关键延迟减少了23.81%。 In the area of scientific computing, digital signal processing, communication and image processing, division is one of the widely used basic operations. Based on SRT-8 algorithm, a SIMD floating-point divider is designed,which is compatible to IEEE-754 standard. The divider supports one double precision floating point division and two parallel single precision floating point division on the same hardware platform. It reduces the iterative division calculation time delay and improves the frequen- cy by optimizing the SRT-8 iterative division structure,choosing parallel processing of quotient and resi- due addition, and adopting rapid storage technique. Besides, it reduces hardware resources and saves area by adopting reuse strategy. Experiments show that the synthesized cell area is 18 601. 968 1μm2 and the frequency reaches up to 2.5 GHz with 40 nm technology library,and the latency of operation is reduced by 23.81% in comparison to the traditional implementation based on SRT-8.
出处 《计算机工程与科学》 CSCD 北大核心 2014年第5期797-803,共7页 Computer Engineering & Science
关键词 SRT-8 浮点除法器 双精度浮点 SIMD单精度浮点 SRT-8 SIMD floating-point division double precision floating-point SIMD single precisionfloating-point
  • 相关文献

参考文献10

  • 1Gerwig G, Wetter H, Schwarz E M, et al. High perform- ance troating point unit with 116 bit wide divider[C]//Proe of the 16th Symposim on Computer Arithmetic, 2003:87- 94.
  • 2Harris D I., Oberman S F, Horowirtz M A. SRT division architectures and implementations[C]//Proc of the 13th IEEE Symposium, 1997:18-25.
  • 3Fandrianto J. Algorithm for high speed shared radix-4 divi- sion and radix-4 square-root [C]///Proc of the 8th IEEE Symposium on Computer Arithmetic, 1987:73-79.
  • 4Oberman S F. Floating-point division and square root algo- rithms and implementation in the AMD-K7 microprocessor [C]//Proe of the 14th Symposium on Computer Arithmetic, 1999:106-115.
  • 5Burgess N, Hinds C N. Design of the ARM VFPll divide and square root synthesisable macrocell[C]//Proc of the 18th IEEE Symposim on Computer Arithmetic, 2007 : 87-96.
  • 6NVIDIA. Fermi: NVIDIA' s next generation CUDA compute architecture[EB/OL]. [2009-10-10]. http: //www. nvidia.com /content/PDF/fermi _ white_ papers/NVIDIA _ Fermi_ Compute_ Archit ecture Whit epaper, pdf.
  • 7Oberman S F, Flynn M. Design issues in division and other floating-point operations[J]. IEEE Transactions on Comput- ers, 1997, 46(2):154-161.
  • 8王县,倪晓强,邢座程.浮点除法算法的分析与研究[C]//计算机工程与工艺,2008:282-283.
  • 9Liu W, Nannarelli A. Power efficient division square root u- nit[J]. IEEE Transactions on Computers, 2012, 61 (8): 1059-1070.
  • 10Baliga H, Cooray N, Gamsaragan E, et al. Improvements in the Intel Core2 Penryn processor family architecture and mi- eroarchitecture[J]. Intel Technology Journal, 2008, 12(3) : 179-192.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部