基于Goldschmidt算法的高性能双精度浮点除法器设计被引量：3

Floating point divider design of high-performance double precision based on Goldschmidt's algorithm

下载PDF

导出

摘要针对双精度浮点除法通常运算过程复杂、延时较大这一问题,提出一种基于Goldschmidt算法设计支持IEEE-754标准的高性能双精度浮点除法器方法。首先,分析Goldschmidt算法运算除法的过程以及迭代运算产生的误差;然后,提出了控制误差的方法;其次,采用了较节约面积的双查找表法确定迭代初值,迭代单元采用并行乘法器结构以提高迭代速度;最后,合理划分流水站,控制迭代过程使浮点除法可以流水执行,从而进一步提高除法器运算速率。实验结果表明,在40 nm工艺下,双精度浮点除法器采用14位迭代初值流水结构,其综合cell面积为84 902.261 8μm2,运行频率可达2.2 GHz;相比采用8位迭代初值流水结构运算速度提高了32.73%,面积增加了5.05%;计算一条双精度浮点除法的延迟为12个时钟周期,流水执行时,单条除法平均延迟为3个时钟周期,与其他处理器中基于SRT算法实现的双精度浮点除法器相比,数据吞吐率提高了3~7倍;与其他处理器中基于Goldschmidt算法实现的双精度浮点除法器相比,数据吞吐率提高了2~3倍。 Focusing on the issue that division is complex and needs a large delay to compute, a kind of method for designing the unit of high-performance double precision floating point divider based on Goldschmidt＇s algorithm was proposed and it supported IEEE-754 standard. Firstly, it was analyzed that how to compute division using Goldschmidt＇s algorithm and the error produced during iterative operation. Then, the method for controlling error was proposed. Secondly, bipartite reciprocal tables were adopted to calculate initial value of iteration with area saving, and parallel multipliers were adopted in the iterative unit for accelerating. Lastly, the executed station was divided reasonably and it made floating point divider supporting pipeline execution with state machine controlling. So, the speed of divider was improved. The experimental results show that the double precision floating point divider adopted 14-bit iterative initial value pipeline structure, its synthesis cell area is 84902. 261 8 ~m2, the running frequency is up to 2.2 GHz with 40 nm technology. Compared with 8-bit iterative initial value pipeline structure, computing speed is increased by 32.73% and area is increased by 5.05%. The delay of a double precision floating division instruction is 12 cycles, and it is decreased to 3 cycles in pipeline execution. Compared with the divider based on SRT algorithm implemented in other proeessers, data throughput is improved by 3 -7 times. Compared with the divider based on Goldschmidt＇s algorithm implemented in other processers, data throughput is improved by 2 -3 times.

作者何婷婷彭元喜雷元武

机构地区国防科学技术大学计算机学院

出处《计算机应用》 CSCD 北大核心 2015年第7期1854-1857,1887,共5页 journal of Computer Applications

基金湖南省重点学科建设项目(434515000008) 航空科学基金资助项目(2013zc88003) 国家自然科学基金资助项目(61402499)

关键词浮点除法器 Goldschmidt算法倒数查找表高性能除法器数字信号处理 floating point divider Goldschmidt＇s algorithm bipartite reciprocal table high-performance divider Digital Signal Processing （DSP）

分类号 TP301.6 [自动化与计算机技术—计算机系统结构] TP342.2 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献12

1OBERMAN S F. Floating-point division and square root algorithms and implementation in the AMD-K7 microprocessor [C]// Proceedings of the 14th IEEE International Symposium on Computer Arithmetic. Piscataway: IEEE, 1999: 106-115.
2李立珺.基于FPGA的除法器算法研究[J].科技信息,2013(5):82-82. 被引量：3
3LIDDICOAT A A. High-performance arithmetic for division and the elementary functions [D]. Stanford: Stanford University, 2002.
4KONG I, SWARTZLANDER E E. A Goldschmidt division method with faster than quadratic convergence [J]. IEEE Transactions on Very Large Scale Integration Systems, 2009, 19(4): 696-700.
5STEVENSON D. An American national IEEE standard for binary floating-point arithmetic [J]. ACM SIGPLAN Notices, 1987, 22(2): 9-25.
6EVENA G, SEIDELB P M, FERGUSONC W E. A parametric error analysis of Goldschmidt's division algorithm [J]. Journal of Computer and System Sciences, 2005, 70(1): 118-139.
7DAS SARMA, D, MATULA D W. Faithful bipartite ROM reciprocal tables [C]// Proceedings of the 12th Symposium on Computer Arithmetic. Piscataway: IEEE, 1995: 17-28.
8DAS SARMA, D, MATULA D W. Measuring the accuracy of ROM reciprocal tables [J]. IEEE Transactions on Computers, 1994, 43(8): 932-940.
9吴铁彬,刘衡竹,杨惠,张剑锋,侯申.一种快速SIMD浮点乘加器的设计与实现[J].计算机工程与科学,2012,34(1):69-73. 被引量：5
10Intel Corporation. Intel 64 and IA-32 architectures optimization reference manual [EB/OL]. [2014-12-16]. http://www.intel.com.

二级参考文献17

1靳战鹏,白永强,沈绪榜.一种64位浮点乘加器的设计与实现[J].计算机工程与应用,2006,42(18):95-98. 被引量：3
2UweMeyer-Baese.数字信号处理的FPGA实现[M].北京:清华大学出版社,2006.
3Grer B,Harrison J,Henry G,et al.Scientific Computing on the Itannium Processor[C]∥Proc of the ACM/IEEE Confer-ence on Supercomputing,2001:1-8.
4Hokenek E,Montoye R K,Cook P W.Second-Generation on RISC Floating Point with Multiply-Add Fused[J].IEEE Journal of Solid-State Circuits,1990,25(5):1207-1213.
5Lang T,Bruguera J D.Floating-Point Fused Multiply-Add with Reduced Latency[J].IEEE Transactions on Comput-ers,2004,53(8):988-1003.
6ANSI/IEEE Std754-2008:Binary Floating-Point Arithmetic[S].IEEE SA Standards Board,2008:6-38.
7Yeh Wen-Chang,Jen Chein-Wei.High-Speed Booth Encoded Parallel Multiplier Design[J].IEEE Transactions on Com-puters,2000,49(7):692-701.
8Vassiliadis S,Schwarz E M,Sung B M.Hard-Wired Multi-pliers with Encoded Partial Products[J].IEEE Trans on Computer,1991,40(11):1181-1197.
9中国科学院计算机技术研究所.一种浮点乘法器及其兼容双精度和双单精度计算的方法:中国,200510053606.2[P/OL].[2011-07-07].http://www.soopat.cn/Patent/200510053606.
10Itoh N,Naemura Y,Makino H,et al.A600-MHz54x54-bit Multiplier with Rectangular-Styled Wallace Tree[J].IEEE Journal of Solid-State Circuits,2001,36(2):249-257.

共引文献6

1杜慧敏,马超.一种快速浮点乘法单元的设计与实现[J].西安邮电学院学报,2013,18(1):62-66. 被引量：4
2何军,田增,郭勇,陈诚.浮点乘加部件延迟对浮点性能影响的研究[J].计算机工程,2013,39(7):311-313.
3何军,黄永勤,朱英.分离通路浮点乘加器设计与实现[J].计算机科学,2013,40(8):28-33. 被引量：1
4陈明敏,易清明,石敏.高速8位微处理器设计[J].计算机应用与软件,2016,33(1):240-243. 被引量：1
5富坤,魏思捷,耿跃华.前导0预测算法前缀模型的研究与实现[J].计算机工程与科学,2017,39(10):1788-1793.
6王庆,周锋,郭乃宏,孔祥晔,王如刚.基于FPGA的裁切机步进电机控制算法设计[J].计算机测量与控制,2022,30(11):127-132. 被引量：5

同被引文献6

1朱建银,沈海斌.高性能单精度除法器的实现[J].微电子学与计算机,2007,24(5):106-108. 被引量：6
2周珍艮,郭立.固定延迟的流水线双精度浮点除法电路[J].微电子学与计算机,2008,25(5):84-87. 被引量：3
3孙一,张鑫,王波,冯为,金西.基于SRT和Restoring算法的双精度浮点除法器设计[J].电子测量技术,2008,31(9):50-53. 被引量：2
4许秋华,刘伟.基于FPGA的浮点运算单元的设计方法[J].大众科技,2009,11(10):17-19. 被引量：2
5王刘成,林永才,姜文刚.快速高精度除法算法的FPGA实现[J].计算机工程,2011,37(10):240-242. 被引量：6
6PENG Yuanxi,CHEN Jiyang,LEI Yuanwu,HE Tingting,DENG Ziye.Low-Latency SRT Division and Square Root Based on Remainder and Quotient Prediction[J].Chinese Journal of Electronics,2017,26(1):58-64. 被引量：1

引证文献3

1卫祥庆,秦水介.基于SRT4的整数除法器设计与优化[J].微处理机,2022,43(2):1-5.
2李旭军,石娜,龙科莅,彭祥.基于多项式逼近算法的精确浮点除法器的设计[J].微电子学与计算机,2023,40(5):90-96. 被引量：2
3赵鹏.基于X87指令集的浮点除法运算单元设计[J].微型电脑应用,2024,40(1):65-68.

二级引证文献2

1谌民迪,万江华.基于泰勒级数近似的浮点开方运算器的设计[J].电子与封装,2024,24(5):42-47.
2牛奕童.基于近似算法的分布参数对机电系统动态响应的频时域研究[J].安徽电气工程职业技术学院学报,2024,29(2):95-103.

1邓子椰,陈书明,彭元喜,雷元武.一种基于SRT-8算法的SIMD浮点除法器的设计与实现[J].计算机工程与科学,2014,36(5):797-803.
2HE Tingting,CHEN Jiyang,LEI Yuanwu,PENG Yuanxi,ZHU Baozhou.High-Performance FP Divider with Sharing Multipliers Based on Goldschmidt Algorithm[J].Chinese Journal of Electronics,2017,26(2):292-298.
3钟强,刘鹏飞,刘宝军,胡宗进,秦绪栋.基于FPGA的浮点除法器的研究与实现[J].中国集成电路,2016,25(9):43-46.
4白永强,沈绪榜,罗旻,靳战鹏.一种高阶除法器的设计与实现[J].微电子学与计算机,2006,23(1):64-66. 被引量：4
5李斌.Pentium的竞争对手[J].管理观察,1997,0(5):56-56.
6赵国宇,郭炜,常轶松,魏继增.一种高效纹理映射单元的硬件体系结构设计[J].计算机工程,2013,39(5):92-95. 被引量：3
7洪一.适用于VLSI的一种并行乘法器结构[J].电讯技术,1989,29(2):17-20.
8方旌堃,朱以南.一种适于VLSI实现的并行乘法器结构[J].固体电子学研究与进展,1989,9(2):143-148.
9李蓉,于伦正,时晨.浮点倒数查找表的构造[J].微电子学与计算机,2007,24(7):23-26. 被引量：2
10栗素娟,阎保定,朱清智.基于FPGA的快速浮点除法器IP核的实现[J].河南科技大学学报（自然科学版）,2008,29(6):34-37. 被引量：5

计算机应用

2015年第7期

浏览历史

内容加载中请稍等...

基于Goldschmidt算法的高性能双精度浮点除法器设计被引量：3

参考文献12

二级参考文献17

共引文献6

同被引文献6

引证文献3

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于Goldschmidt算法的高性能双精度浮点除法器设计 被引量：3

参考文献12

二级参考文献17

共引文献6

同被引文献6

引证文献3

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于Goldschmidt算法的高性能双精度浮点除法器设计被引量：3