期刊文献+

分离通路浮点乘加器设计与实现 被引量:1

Design and Implementation of Separated Path Floating-point Fused Multiply-Add Unit
下载PDF
导出
摘要 针对传统浮点融合乘加器会增加独立浮点加减法、乘法等运算延迟的缺点,首先设计并实现了一种分离通路浮点乘加器SPFMA,通过分离乘法和加法通路,在保持融合乘加运算延迟6拍延迟不变的情况下,将独立乘法和加法等运算延迟由6拍减为4拍,克服了传统融合乘加器的缺点。然后经专用工艺单元库逻辑综合评估,SPFMA可工作在1.2GHz以上,面积60779.44um2。最后在硬件仿真加速器平台上运行SPEC CPU2000浮点测试课题对其进行性能评估,结果表明所有浮点课题性能均有所提高,最大提高5.25%,平均提高1.61%,证明SPFMA可进一步提高浮点性能。 Considering the shortcoming that the fused multiply-add(FMA)unit increases the latency of separate floa- ting-point addition and multiplication operations, a separated path FMA(SPFMA)unit was designed and implemented firstly. The SPFMA unit can reduce the multiplication and addition latency from 6 cycles to 4 cycles while keeping the FMA operation latency to 6 cycles by separating the multiplication and addition path, overcoming the shortcoming of traditional FMA unit. Then utilizing the specific technology cell library, the SPFMA was logically synthesized and could work at 1.2GHz above with area about 60779.44um2. Finally based on the hardware emulation accelerating platform, the performance of the SPFMA unit was estimated through running the SPEC CPU2000 floating-point benchmarks. It turned out that the performances of the benchmarks are all improved, 5. 25% at most and 1.61% on average, which proves that the SPFMA unit helps to promote floating-point performance further.
出处 《计算机科学》 CSCD 北大核心 2013年第8期28-33,共6页 Computer Science
关键词 浮点加法 浮点乘法 融合乘加 分离通路 浮点性能 运算延迟 Floating-point add Floating-point multiply Fused multiply-add Separated path Floating-point perfor-mance Operation latency
  • 相关文献

参考文献20

  • 1Montoye R K, Hokenek E,Runyon S L. Design of the IBM RISC System/6000 Floating-Point Execution Unit[J]. IBM Journal of Research and Development, 1990,34:61-62.
  • 2Eisen L, III J W W, Tast H-W, et al. IBM POWER6 Accelera- tors VMX and DFU [J]. IBM Journal of Research and Develop- ment, 2007,51: 663-684.
  • 3Boersma M, Kroener M, Layer C, et al. The POWER7 Binary Floating-Point Unit[C] //Proceedings of IEEE Symposium on Computer Arithmetic. Ttibingen, Germany, IEEE Computer So- ciety, 2011.
  • 4Sharangpani H, Arora K. Itanium Processor Microarchitectur [J]. IEEE Micro Magazine, 2000,20(5) : 24-43.
  • 5MaruyamaT, Yoshida T, Kan R, et al. SPARC 6 4 VIIIfx: A New-generation Octocore Processor for Petascale Computing [J]. IEEE Micro, March-April 2010: 30-40.
  • 6Glaskowsky P N. NVIDIA's Fermi: The First Complete GPU Computing Architecture, Nvidia Fermi Whitepaper [EB/OL]. http://www, nvidia, com/content/PDF/fermi_ white_ papers/ NVIDIAFermiComputeArchitectureWhitepaper. txi, 2012-09-27.
  • 7IEEE Computer Society. IEEE Standard for Floating-Point A-rithmetic[S]. IEEE Standard 754-2008. New York, USA, Au- gust 2008.
  • 8Lutz D. Fused Multiply-Add Microarchitecture Comprising Sep- arate Early-Normalizing Multiply and Add Pipelines[C]// Pro- ceedings of IEEE Symposium on Computer Arithmetic. Tibingen, Germany, IEEE Computer Society, 2011.
  • 9Galal S, Horowitz M. Latency Sensitive FMA Design[C]//Pro- ceedings of IEEE Symposium on Computer Arithmetic. Ttibin- gen, Germany, IEEE Computer Society, 2011.
  • 10SPEC. CPF2000(Floating Point Component of SPEC CPU2000) [EB/OL]. http://www, spee. org/epu2000/CFP2000, 2012-09- 27.

二级参考文献21

  • 1靳战鹏,白永强,沈绪榜.一种64位浮点乘加器的设计与实现[J].计算机工程与应用,2006,42(18):95-98. 被引量:3
  • 2Grer B,Harrison J,Henry G,et al.Scientific Computing on the Itannium Processor[C]∥Proc of the ACM/IEEE Confer-ence on Supercomputing,2001:1-8.
  • 3Hokenek E,Montoye R K,Cook P W.Second-Generation on RISC Floating Point with Multiply-Add Fused[J].IEEE Journal of Solid-State Circuits,1990,25(5):1207-1213.
  • 4Lang T,Bruguera J D.Floating-Point Fused Multiply-Add with Reduced Latency[J].IEEE Transactions on Comput-ers,2004,53(8):988-1003.
  • 5ANSI/IEEE Std754-2008:Binary Floating-Point Arithmetic[S].IEEE SA Standards Board,2008:6-38.
  • 6Yeh Wen-Chang,Jen Chein-Wei.High-Speed Booth Encoded Parallel Multiplier Design[J].IEEE Transactions on Com-puters,2000,49(7):692-701.
  • 7Vassiliadis S,Schwarz E M,Sung B M.Hard-Wired Multi-pliers with Encoded Partial Products[J].IEEE Trans on Computer,1991,40(11):1181-1197.
  • 8中国科学院计算机技术研究所.一种浮点乘法器及其兼容双精度和双单精度计算的方法:中国,200510053606.2[P/OL].[2011-07-07].http://www.soopat.cn/Patent/200510053606.
  • 9Itoh N,Naemura Y,Makino H,et al.A600-MHz54x54-bit Multiplier with Rectangular-Styled Wallace Tree[J].IEEE Journal of Solid-State Circuits,2001,36(2):249-257.
  • 10Tomas Lang, Javier D Bruguera. Floating-Point MultiplyAdd-Fused with Reduced Latency. IEEE Transactions on Computers, 2004, 53(8): 988~1003.

共引文献11

同被引文献8

  • 1WANG X, ZHANG Y, WANG F, et al. A configurable floating- point discrete Hilbert transform processor for accelerating the calcu- lation of fiher in Katsevich formula [ J]. WSEAS Transactions on Communications, 2012, 11 (11) : 395 - 404.
  • 2MONTOYE R K, HOKENEK E, RUNYON S L. Design of the IBM RISC System/6000 floating-point execution unit [ J]. IBM Journal of Research and Development, 1990, 34(1): 59-70.
  • 3LANG T, BRUGUERA J D. Floating-point fused multiply-add with reduced latency [ J]. IEEE Transactions on Computers, 2004, 53 (8) : 988 - 1003.
  • 4LANG T, BRUGUERA J D. Floating-point fused muhiply-add: re- duced latency for floating-point addition [ C]// ARITH '05: Pro- ceedings of the 17th IEEE Symposium on Computer Arithmetie. Washington, DC: IEEE Computer Soeiety, 2005:42-51.
  • 5RUBINFIELD L P. A proof of the modified Booth's algorithm for multiplication [ J]. IEEE Transactions on Computer, 1975, 24 (10): 1014-1015.
  • 6彭元喜,杨洪杰,谢刚.X-DSP浮点乘法器的设计与实现[J].计算机应用,2010,30(11):3121-3125. 被引量:1
  • 7方维,孙广中,吴超,陈国良.一种三维快速傅里叶变换并行算法[J].计算机研究与发展,2011,48(3):440-446. 被引量:9
  • 8张拥军,陈艇.基于软件无线电的并行多输入多输出均衡技术[J].计算机应用,2015,35(4):1179-1184. 被引量:3

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部