期刊文献+

龙芯2号处理器功能部件设计 被引量:1

Functional Units Design in Godson-2 Processor
下载PDF
导出
摘要 功能部件是处理器中进行指令运算的核心单元,它的算法及其实现直接影响到处理器的总体性能.介绍了龙芯2号处理器的功能部件,探讨了从算法到物理设计等不同层次的功能部件设计方法.龙芯2号功能部件分为两个定点ALU和两个浮点ALU实现,除实现完整的MIPS定、浮点指令集外,还实现了龙芯2号类MMX自定义多媒体指令集以及定点操作在浮点部件(FPU)中的数据通路复用.龙芯2号浮点部件遵照IEEE754和MIPS相关标准,浮点加法4拍完成,浮点乘法5拍完成,浮点除法4~17拍完成.物理设计支持0.18μm工艺下主频500MHz的标准单元实现,浮点单精度峰值性能达到2GFLOPS.双精度峰值性能达到1GFLOPS. The algorithm and its implementation of functional units are very vital for the performance of today's state of art general-purpose microprocessor design. An overview of the functional units design in God,n-2 processor is given and some details including architecture and physical design are described. Godson-2 has two fixed-point functional units: ALU1 and ALU2, and two floating-point units (FPU): FALU1 and FALU2. The MMX-like instructions are also implemented in Godson2 FPU. The FPU is IEEE-754 and MIPS compliant. The floating-point adder and multiplier have 4-cycle and 5-cycle latencies respectively, and the floating-point division has various 4-17 cycle latencies. The physical design based on the standard cell methodology with SMIC 0.18μm CMOS technology show that 2Gflops for single precision and 1Gflops for double precision performance are achieved with the speed of 500MHZ.
出处 《计算机研究与发展》 EI CSCD 北大核心 2006年第6期967-973,共7页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2002AA111100 2002AA110010)~~
关键词 龙芯2号处理器 功能部件设计 浮点部件 多媒体指令集 Godson-2 processor functional units design floating-point units multimedia instruction set
  • 相关文献

参考文献2

二级参考文献12

  • 1S F Oberman, M J Flyrm. Design issues in division and other floating-point operations. IEEE Trans on Computers, 1997, 46(2): 154-161.
  • 2P Soderquist, M Leeser. Division and square root: Choosing the right implementation. IEEE Micro, 1997, 17(4) : 56-66.
  • 3M D Ercegovac, T Lang. Division and square root: Digit recurrence algorithms and implementations. Norwell, Mass:Kluwer Academic Publishers, 1994.
  • 4David Stevenson et al. An American national standard IEEE standard for binary floating-point arithmetic. ACM SIGPLAN Notices, 1987, 22(2): 9-25.
  • 5J Fandrianto. Algorithm for high-speed shared radix 4 division and radix 4 square root. The 8th Symp Computer Arithmetic, Como,Italy, 1987.
  • 6J Fandrianto. Algorithm for high-speed shared radix 8 division and radix 8 square mot. The 9th IEEE Symp Computer Arithmetic,Santa Cruz, CA, 1989.
  • 7Stuart F Oberman, Michael J Flynn. Minimizing the complexity of SRT tables. IEEE Trans on Very Large Scale Integration (VLSI) Systems, 1998, 6(1): 141-149.
  • 8G S Taylor. Radix 16 SRT dividers with overlapped quotient selection stages. The 7th IEEE Symp Computer Arithmetic,Urbana, Illinois, 1985.
  • 9N Quach, M Flynn. A radix-64 fleating-point divider. Computer Systems Laboratory, Stanford University, Tech Rep: CSL-TR-92-529, 1992.
  • 10D Harris, S Oberman, M Horowitz. SRT division architectures and implementations. The 13th Symp Computer Arithmetic,Asilomar, CA, 1997.

共引文献52

同被引文献5

  • 1胡伟武,张福新,李祖松.龙芯2号处理器设计和性能分析[J].计算机研究与发展,2006,43(6):959-966. 被引量:37
  • 2Loeffier C, Ligtenberg A, Moschytz G S. Practical Fast 1D DCT Algorithms with 11 Multiplications[C]//Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing. [S. l.]:IEEE Press, 1989: 988-991.
  • 3Arai A, Agui T, Nakajima M. A Fast DCT-SQ Scheme for Images[J]. Trans. of the IEICE, 1988, E71(11): 1095-1097.
  • 4IEEE WG. IEEE G.216-1998 Presentation to IEEE G.216 Video Compression Measurement Subcommittee on IEEE 1180/1190 Standard, Discrete Cosine Transform Accuracy Test[S]. 1998.
  • 5Hennessy J L D. A Patterson Computer Architecture: A Quantitative Approach[M]. 2nd ed. San Francisco, USA: Morgan Kaufmann, 1996.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部