龙芯2号处理器功能部件设计被引量：1

Functional Units Design in Godson-2 Processor

下载PDF

导出

摘要功能部件是处理器中进行指令运算的核心单元，它的算法及其实现直接影响到处理器的总体性能．介绍了龙芯2号处理器的功能部件，探讨了从算法到物理设计等不同层次的功能部件设计方法．龙芯2号功能部件分为两个定点ALU和两个浮点ALU实现，除实现完整的MIPS定、浮点指令集外，还实现了龙芯2号类MMX自定义多媒体指令集以及定点操作在浮点部件（FPU）中的数据通路复用．龙芯2号浮点部件遵照IEEE754和MIPS相关标准，浮点加法4拍完成，浮点乘法5拍完成，浮点除法4～17拍完成．物理设计支持0．18μm工艺下主频500MHz的标准单元实现，浮点单精度峰值性能达到2GFLOPS．双精度峰值性能达到1GFLOPS． The algorithm and its implementation of functional units are very vital for the performance of today＇s state of art general-purpose microprocessor design. An overview of the functional units design in God,n-2 processor is given and some details including architecture and physical design are described. Godson-2 has two fixed-point functional units： ALU1 and ALU2, and two floating-point units （FPU）： FALU1 and FALU2. The MMX-like instructions are also implemented in Godson2 FPU. The FPU is IEEE-754 and MIPS compliant. The floating-point adder and multiplier have 4-cycle and 5-cycle latencies respectively, and the floating-point division has various 4-17 cycle latencies. The physical design based on the standard cell methodology with SMIC 0.18μm CMOS technology show that 2Gflops for single precision and 1Gflops for double precision performance are achieved with the speed of 500MHZ.

作者张戈齐子初胡伟武

机构地区中国科学院计算技术研究所计算机系统结构重点实验室中国科学院研究生院

出处《计算机研究与发展》 EI CSCD 北大核心 2006年第6期967-973,共7页 Journal of Computer Research and Development

基金国家"八六三"高技术研究发展计划基金项目(2002AA111100 2002AA110010)~~

关键词龙芯2号处理器功能部件设计浮点部件多媒体指令集 Godson-2 processor functional units design floating-point units multimedia instruction set

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献2

1刘华平,胡伟武.一种减小SRT浮点算法时延的优化方法[J].计算机研究与发展,2003,40(11):1650-1656. 被引量：2
2Wei-WuHu Fu-XinZhang Zu-SongLi.Microarchitecture of the Godson-2 Processor[J].Journal of Computer Science & Technology,2005,20(2):243-249. 被引量：52

二级参考文献12

1S F Oberman, M J Flyrm. Design issues in division and other floating-point operations. IEEE Trans on Computers, 1997, 46(2): 154-161.
2P Soderquist, M Leeser. Division and square root: Choosing the right implementation. IEEE Micro, 1997, 17(4) : 56-66.
3M D Ercegovac, T Lang. Division and square root: Digit recurrence algorithms and implementations. Norwell, Mass:Kluwer Academic Publishers, 1994.
4David Stevenson et al. An American national standard IEEE standard for binary floating-point arithmetic. ACM SIGPLAN Notices, 1987, 22(2): 9-25.
5J Fandrianto. Algorithm for high-speed shared radix 4 division and radix 4 square root. The 8th Symp Computer Arithmetic, Como,Italy, 1987.
6J Fandrianto. Algorithm for high-speed shared radix 8 division and radix 8 square mot. The 9th IEEE Symp Computer Arithmetic,Santa Cruz, CA, 1989.
7Stuart F Oberman, Michael J Flynn. Minimizing the complexity of SRT tables. IEEE Trans on Very Large Scale Integration (VLSI) Systems, 1998, 6(1): 141-149.
8G S Taylor. Radix 16 SRT dividers with overlapped quotient selection stages. The 7th IEEE Symp Computer Arithmetic,Urbana, Illinois, 1985.
9N Quach, M Flynn. A radix-64 fleating-point divider. Computer Systems Laboratory, Stanford University, Tech Rep: CSL-TR-92-529, 1992.
10D Harris, S Oberman, M Horowitz. SRT division architectures and implementations. The 13th Symp Computer Arithmetic,Asilomar, CA, 1997.

共引文献52

1蔡嵩松,刘奇,沈海华,章隆兵.跨平台系统级虚拟机的访存优化[J].计算机研究与发展,2012,49(S1):131-136. 被引量：2
2邱吉,高翔,彭飞,汪文祥,蒋毅飞.基于二进制插桩的ASIP处理器指令集混合仿真方法[J].计算机研究与发展,2012,49(S1):330-335.
3胡伟武,侯锐,肖俊华,章隆宾.High Performance General-Purpose Microprocessors： Past and Future[J].Journal of Computer Science & Technology,2006,21(5):631-640. 被引量：5
4张福新,章隆兵,胡伟武.基于SimpleScalar的龙芯CPU模拟器Sim-Godson[J].计算机学报,2007,30(1):68-73. 被引量：24
5胡伟武,赵继业,钟石强,杨旭,Elio Guidetti,吴永强.Implementing a 1GHz Four-Issue Out-of-Order Execution Microprocessor in a Standard Cell ASIC Methodology[J].Journal of Computer Science & Technology,2007,22(1):1-14. 被引量：14
6郇丹丹,李祖松,王剑,章隆兵,胡伟武,刘志勇.快速地址计算的自适应栈高速缓存[J].计算机研究与发展,2007,44(1):169-176. 被引量：1
7郇丹丹,李祖松,胡伟武,刘志勇.Cache自适应写分配策略[J].计算机研究与发展,2007,44(2):348-354. 被引量：2
8汤彦,张福新,唐志敏.基于程序周期行为的快速模拟方法[J].计算机工程,2007,33(7):65-67. 被引量：1
9黄琨,章隆兵,胡伟武,张戈.一种基于龙芯CPU的结构级功耗评估新方法[J].计算机研究与发展,2007,44(5):782-789. 被引量：4
10李祖松,许先超,胡伟武,唐志敏.龙芯2号同时多线程处理器的软硬件接口设计[J].软件学报,2007,18(7):1806-1817. 被引量：2

同被引文献5

1胡伟武,张福新,李祖松.龙芯2号处理器设计和性能分析[J].计算机研究与发展,2006,43(6):959-966. 被引量：37
2Loeffier C, Ligtenberg A, Moschytz G S. Practical Fast 1D DCT Algorithms with 11 Multiplications[C]//Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing. [S. l.]:IEEE Press, 1989: 988-991.
3Arai A, Agui T, Nakajima M. A Fast DCT-SQ Scheme for Images[J]. Trans. of the IEICE, 1988, E71(11): 1095-1097.
4IEEE WG. IEEE G.216-1998 Presentation to IEEE G.216 Video Compression Measurement Subcommittee on IEEE 1180/1190 Standard, Discrete Cosine Transform Accuracy Test[S]. 1998.
5Hennessy J L D. A Patterson Computer Architecture: A Quantitative Approach[M]. 2nd ed. San Francisco, USA: Morgan Kaufmann, 1996.

引证文献1

1王明,彭成磊,都思丹.面向龙芯平台的快速DCT算法及其实现[J].计算机工程,2009,35(17):223-225. 被引量：1

二级引证文献1

1裴晓航,何颂颂.基于龙芯3B的H.264解码器的向量化[J].电子技术（上海）,2010(10):88-90. 被引量：3

1Athlon XP“芯”跳全体验[J].现代计算机（中旬刊）,2001(12):85-86.
2测试平台赏析[J].大众硬件,2004(8):39-41.
3新型高端台式机处理器[J].消费电子,2007(23):65-65.
4王全胜.基于龙芯2号处理器的SPEC 2000测试程序的分析与应用[J].现代电子技术,2010,33(17):202-204. 被引量：3
5李祖松,许先超,胡伟武,唐志敏.龙芯2号处理器的同时多线程设计[J].计算机学报,2009,32(11):2265-2273. 被引量：10
6王迎春,高德远,等.RISC FPU中超越函数运算算法研究与实现[J].西北工业大学学报,1999,17(B12):164-169.
7关天夏.每周时评[J].中国计算机用户,2005(29):9-9.
8选购指南[J].电脑时空,2013(3):55-57.
9龙芯2号电脑问世[J].军民两用技术与产品,2006(5):13-13.
10志志雄.硬件评测文章深入分析——CPU、内存篇[J].电脑,2003(3):53-58.

计算机研究与发展

2006年第6期

浏览历史

内容加载中请稍等...

龙芯2号处理器功能部件设计被引量：1

参考文献2

二级参考文献12

共引文献52

同被引文献5

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

龙芯2号处理器功能部件设计 被引量：1

参考文献2

二级参考文献12

共引文献52

同被引文献5

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

龙芯2号处理器功能部件设计被引量：1