期刊文献+

一种低功耗的多端口寄存器文件结构设计

A low power design structure for multi-port register file
下载PDF
导出
摘要 为了降低寄存器功耗而不损失处理器性能,提出一种基于读写队列的多体寄存器文件结构(multi-bank register file,MBRF)。该结构使用多个寄存器体来分担多端口的访问压力,并且为每个寄存器体设置相应的读写队列;通过指令分解将读写操作缓存在队列中,从而消除多体结构潜在的访问冲突;采用组合和旁路2种分配策略,减少缓冲队列的长度和对寄存器的读写请求。该结构在一个四发射的超标量模拟器上进行评估。研究结果表明:整个寄存器文件最终节省了52%的功耗,而处理器的IPC损失仅为1.6%。与其他寄存器文件相比,基于读写队列的MBRF结构在多发射处理器应用中具有明显的优势。 To reduce the power of register file without bringing processor performance loss, a novel multi-bank register file(MBRF) based on read and write queue was presented. Several register banks were employed to partake the register file access pressure on multiple ports, a read queue and a write queue were organized for each register bank. The read and write operations were decomposed from each instruction into two queues to avoid the potential access conflicts. In addition, both combining and forwarding dispatch strategies were used to reduce the queue length, as well as read and write requires for registers. The design structure was evaluated on a four-issue superscalar simulator. The results show that the total power of register file is reduced by 52% while the processor IPC loss is just no more than 1.6%. Compared with other register files, the MBRF based on read and write queue takes on an evident advantage in multi-issue processor.
出处 《中南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2015年第8期2914-2922,共9页 Journal of Central South University:Science and Technology
基金 国家高技术研究发展计划(863计划)项目(2011AA120204) 国防科工局"十二五"民用航天预研项目(YY2011-012(D020201))~~
关键词 多发射 多体寄存器文件 读写队列 访问冲突 指令分解 multi-issue multi-bank register file read and write queue access conflict instruction decomposition
  • 相关文献

参考文献16

  • 1Capalija D, Abdelrahman T S. Microarchitecture of a coarse-grain out-of-order superscalar processor[J]. IEEE Transacttions on Parallel and Distributed Systems, 2013, 24(2): 392-405.
  • 2Gepner P, Gamayunov V, Fraser D L. The 2nd generation Intel core processor: Architectural features supporting HPC[C]//The 10th International Symposium on Parallel and Distributed Computing. New York: IEEE, 2011: 17-24.
  • 3Aggarwal A, Franklin M. Energy efficient asymmetrically ported register files[C]//The 21 st International Conference on Computer Design. New York: IEEE, 2003: 2-7.
  • 4Hironaka T, Maeda M, Tanigawa K, et al. Superscalar processor with multi-bank register file[C]//The Innovative Architecture for Future Generation High-Performance Processors and Systems. New York: IEEE, 2005: 3-12.
  • 5Tseng J H, Asanovic K. A speculative control scheme for an energy-efficient banked register file[J]. IEEE Transactions on Computers, 2005, 54(6): 741-751.
  • 6Preston R P, Badeau R W, Bailey D W, et al. Design of an 8-wide superscalar R/SC microprocessor with simultaneous multithreading[C]//The IEEE International Solid-State Circuits Conference. New York: IEEE, 2002: 266-334.
  • 7Sangireddy R. Instruction format based selective execution for register port complexity reduction in high-performance processors[C]//The Third International Conference on Information Technology: New Generations. New York: IEEE, 2006: 227-232.
  • 8Balkan D, Sharkey J, Ponomarev D, et al. Selective writeback: Reducing register file pressure and energy consumption[J]. IEEE Transactions on Very Large Scale Integration Systems, 2008, 16(6): 650-661.
  • 9ZHANG Yanjun, HE Hu, SUN Yihe. A new register file access architecture for software pipelining in VLIW processors[C]//The 2005 Asia and South Pacific Design Automation Conference. New York: IEEE, 2005: 627-630.
  • 10Kessler R E. The alpha 21264 microprocessor[J]. IEEE Micro, 1999, 19(2): 24-36.

二级参考文献15

  • 1Wei-WuHu Fu-XinZhang Zu-SongLi.Microarchitecture of the Godson-2 Processor[J].Journal of Computer Science & Technology,2005,20(2):243-249. 被引量:52
  • 2Cootes T F, Taylor C J, Cooper D H et al. Active shape models-their training and application. Computer Vision and Image Understanding, 1995, 61(1): 38-59
  • 3Kuzmanov G, Vassiliadis S, Eijndhoven J. A 2D addressing mode for multimedia applications//Proceedings of the Workshop on System Architecture, Modeling, and Simulation (SAMOS 2001). Samos, Greece, 2002: 291-306
  • 4Budnik P, Kuck D J. The organization and use of parallel memories. IEEE Transactions on Computers, 1971, 20(12) : 1566-1569
  • 5Chen S, Postula A, Jozwiak L. Synthesis of XOR storage schemes with different cost for minimization of memory contention//Proceedings of the Euromicro Conference. Milan, Italy, 1999:1170-1177
  • 6Lee H, Moon K A, Park J W. Design of parallel processing system for facial image retrieval//Proceedings of the 4th International ACPC Conference. Salzburg, Austria, 1999: 592-593
  • 7Lee H, Park J W. Parallel processing system for multi-access memory system//Proceedings of the World Multi-Conference on Systematics, Cybernetics, and Information. 2000: 561- 565
  • 8Kim K, Prasanna V K. Latin squares for parallel array access. IEEE Transactions on Parallel and Distributed Systems, 1993, 4(4): 361-370
  • 9Lee D. Scrambled storage for parallel memory systems//Proceedings of the IEEE International Symposium on Computer Architecture Honolulu. Hawaii, 1988:232-239
  • 10Park J W. An efficient buffer memory system for subarray access. IEEE Transactions on Parallel and Distributed Systems, 2001, 12(3): 316-335

共引文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部