期刊文献+

基于Goldschmidt算法的高性能双精度浮点除法器设计 被引量:3

Floating point divider design of high-performance double precision based on Goldschmidt's algorithm
下载PDF
导出
摘要 针对双精度浮点除法通常运算过程复杂、延时较大这一问题,提出一种基于Goldschmidt算法设计支持IEEE-754标准的高性能双精度浮点除法器方法。首先,分析Goldschmidt算法运算除法的过程以及迭代运算产生的误差;然后,提出了控制误差的方法;其次,采用了较节约面积的双查找表法确定迭代初值,迭代单元采用并行乘法器结构以提高迭代速度;最后,合理划分流水站,控制迭代过程使浮点除法可以流水执行,从而进一步提高除法器运算速率。实验结果表明,在40 nm工艺下,双精度浮点除法器采用14位迭代初值流水结构,其综合cell面积为84 902.261 8μm2,运行频率可达2.2 GHz;相比采用8位迭代初值流水结构运算速度提高了32.73%,面积增加了5.05%;计算一条双精度浮点除法的延迟为12个时钟周期,流水执行时,单条除法平均延迟为3个时钟周期,与其他处理器中基于SRT算法实现的双精度浮点除法器相比,数据吞吐率提高了3~7倍;与其他处理器中基于Goldschmidt算法实现的双精度浮点除法器相比,数据吞吐率提高了2~3倍。 Focusing on the issue that division is complex and needs a large delay to compute, a kind of method for designing the unit of high-performance double precision floating point divider based on Goldschmidt's algorithm was proposed and it supported IEEE-754 standard. Firstly, it was analyzed that how to compute division using Goldschmidt's algorithm and the error produced during iterative operation. Then, the method for controlling error was proposed. Secondly, bipartite reciprocal tables were adopted to calculate initial value of iteration with area saving, and parallel multipliers were adopted in the iterative unit for accelerating. Lastly, the executed station was divided reasonably and it made floating point divider supporting pipeline execution with state machine controlling. So, the speed of divider was improved. The experimental results show that the double precision floating point divider adopted 14-bit iterative initial value pipeline structure, its synthesis cell area is 84902. 261 8 ~m2, the running frequency is up to 2.2 GHz with 40 nm technology. Compared with 8-bit iterative initial value pipeline structure, computing speed is increased by 32.73% and area is increased by 5.05%. The delay of a double precision floating division instruction is 12 cycles, and it is decreased to 3 cycles in pipeline execution. Compared with the divider based on SRT algorithm implemented in other proeessers, data throughput is improved by 3 -7 times. Compared with the divider based on Goldschmidt's algorithm implemented in other processers, data throughput is improved by 2 -3 times.
出处 《计算机应用》 CSCD 北大核心 2015年第7期1854-1857,1887,共5页 journal of Computer Applications
基金 湖南省重点学科建设项目(434515000008) 航空科学基金资助项目(2013zc88003) 国家自然科学基金资助项目(61402499)
关键词 浮点除法器 Goldschmidt算法 倒数查找表 高性能除法器 数字信号处理 floating point divider Goldschmidt's algorithm bipartite reciprocal table high-performance divider Digital Signal Processing (DSP)
  • 相关文献

参考文献12

  • 1OBERMAN S F. Floating-point division and square root algorithms and implementation in the AMD-K7 microprocessor [C]// Proceedings of the 14th IEEE International Symposium on Computer Arithmetic. Piscataway: IEEE, 1999: 106-115.
  • 2李立珺.基于FPGA的除法器算法研究[J].科技信息,2013(5):82-82. 被引量:3
  • 3LIDDICOAT A A. High-performance arithmetic for division and the elementary functions [D]. Stanford: Stanford University, 2002.
  • 4KONG I, SWARTZLANDER E E. A Goldschmidt division method with faster than quadratic convergence [J]. IEEE Transactions on Very Large Scale Integration Systems, 2009, 19(4): 696-700.
  • 5STEVENSON D. An American national IEEE standard for binary floating-point arithmetic [J]. ACM SIGPLAN Notices, 1987, 22(2): 9-25.
  • 6EVENA G, SEIDELB P M, FERGUSONC W E. A parametric error analysis of Goldschmidt's division algorithm [J]. Journal of Computer and System Sciences, 2005, 70(1): 118-139.
  • 7DAS SARMA, D, MATULA D W. Faithful bipartite ROM reciprocal tables [C]// Proceedings of the 12th Symposium on Computer Arithmetic. Piscataway: IEEE, 1995: 17-28.
  • 8DAS SARMA, D, MATULA D W. Measuring the accuracy of ROM reciprocal tables [J]. IEEE Transactions on Computers, 1994, 43(8): 932-940.
  • 9吴铁彬,刘衡竹,杨惠,张剑锋,侯申.一种快速SIMD浮点乘加器的设计与实现[J].计算机工程与科学,2012,34(1):69-73. 被引量:5
  • 10Intel Corporation. Intel 64 and IA-32 architectures optimization reference manual [EB/OL]. [2014-12-16]. http://www.intel.com.

二级参考文献17

  • 1靳战鹏,白永强,沈绪榜.一种64位浮点乘加器的设计与实现[J].计算机工程与应用,2006,42(18):95-98. 被引量:3
  • 2UweMeyer-Baese.数字信号处理的FPGA实现[M].北京:清华大学出版社,2006.
  • 3Grer B,Harrison J,Henry G,et al.Scientific Computing on the Itannium Processor[C]∥Proc of the ACM/IEEE Confer-ence on Supercomputing,2001:1-8.
  • 4Hokenek E,Montoye R K,Cook P W.Second-Generation on RISC Floating Point with Multiply-Add Fused[J].IEEE Journal of Solid-State Circuits,1990,25(5):1207-1213.
  • 5Lang T,Bruguera J D.Floating-Point Fused Multiply-Add with Reduced Latency[J].IEEE Transactions on Comput-ers,2004,53(8):988-1003.
  • 6ANSI/IEEE Std754-2008:Binary Floating-Point Arithmetic[S].IEEE SA Standards Board,2008:6-38.
  • 7Yeh Wen-Chang,Jen Chein-Wei.High-Speed Booth Encoded Parallel Multiplier Design[J].IEEE Transactions on Com-puters,2000,49(7):692-701.
  • 8Vassiliadis S,Schwarz E M,Sung B M.Hard-Wired Multi-pliers with Encoded Partial Products[J].IEEE Trans on Computer,1991,40(11):1181-1197.
  • 9中国科学院计算机技术研究所.一种浮点乘法器及其兼容双精度和双单精度计算的方法:中国,200510053606.2[P/OL].[2011-07-07].http://www.soopat.cn/Patent/200510053606.
  • 10Itoh N,Naemura Y,Makino H,et al.A600-MHz54x54-bit Multiplier with Rectangular-Styled Wallace Tree[J].IEEE Journal of Solid-State Circuits,2001,36(2):249-257.

共引文献6

同被引文献6

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部