期刊文献+

一种提高同时多线程VLIW处理器中取指单元吞吐率的方法 被引量:2

A Method to Improve the Throughput of the Instruction Fetch Unit in SMT VLIW Processors
下载PDF
导出
摘要 在同时多线程处理器中,提高取指单元的吞吐率意味着各线程之间的Cache竞争更加激烈,而这种竞争又制约着取指单元吞吐率的提高。本文针对当前超长指令字体系结构的新特点,提出了一种同时提高取指单元和处理器吞吐率的方法。该方法通过尽可能早地作废取指流水线中的无效地址,减少了由无效取指导致的程序Cache冲突,也提高了整个处理器的性能。实验结果表明,该方法使处理器和取指单元的吞吐率均相对提高了12%~23%,而一级程序Cache的失效率则略微增加甚至降低。另外,它还能够减少10%~25%的一级程序Cache读访问,从而降低了处理器的功耗。 In a simuhaneous multithreaded processor, improving the throughput of the instruction fetch unit usually means that there is more drastic cache competition between threads, but this competition limits the throughput reversely. Based on the characteristics of the current VLIW architectures,this paper presents an instruction fetch scheme that improves the throughput of the fetch unit and the whole processor. By canceling the invalid addresses in the instruction fetching pipeline, it decreases those conflicts of program caches caused by invalid instruction fetch. As the experimental results show, this scheme can improve the throughput of the instruction unit and the performance of the whole processor by 12~23% relatively,while the program cache's miss rate increases appreciably, even decreases sometimes. It also reduces the program cache's accesses by 10%~25%, so the power consumption of the whole processor is decreased.
出处 《计算机工程与科学》 CSCD 2007年第6期97-101,共5页 Computer Engineering & Science
基金 国家863计划资助项目(2004AA1Z1040) 国家自然科学基金资助项目(60473079)
关键词 同时多线程 超长指令字 cache冲突 取指 无效地址 SMT VLIW cache conflict instruction fetch invalid address
  • 相关文献

参考文献10

  • 1Tullsen D M,Eggers S J,Emer J S,et al.Exploiting Choice:Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor[A].Pro of the 23rd Annual Int'l Symp on Computer Architecture[C].1996.191-202.
  • 2Tullsen D M,Lo L J,Eggers S J,et al.Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor[A].Proc of the 5th Int'l Symp on High Performance Computer Architecture[C].1999.54-58.
  • 3Suh C G,Devadas S,Rudolph L.A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning[A].Pro of the 8th Int'l Symp on High-Performance Computer Architecture[C].2002.117-118.
  • 4Knijnenburg P M W.Branch Classification to Control Instruction Fetch in Simultaneous Multithreaded Architectures[A].Proc of the Int'l Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems[C].2002.67-76.
  • 5Lo J L,Eggers S J,Levy H M,et al.Tuning Compiler Optimizations for Simultaneous Multithreading[A].Proc of the 30th Int'l Symp on Microarchitecture[C].1997.114-124
  • 6Kumar R,Tullsen D.Compiling for Instruction Cache Performance on a Multithreaded Architecture[A].Proc of the 35th Int'l Symp on Microarchitecture[C].2002.419-429.
  • 7Lo J L,Barroso L A,Eggers S J,et al.Parekh An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors[A].Proc of the 25th Annual Int'l Symp on Computer Architecture[C].1998.39-50.
  • 8TMS320C6000 CPU and Instruction Set Reference Guide[M].Texas Instruments Incorporated,1998.
  • 9http://www.cs.wisc.edu/~markhill/DineroIV/,2005-10.
  • 10Lee C.UTDSP Benchmark Suite[Z].University of Toronto,1992.

同被引文献19

  • 1陈书明,李振涛,万江华,胡定磊,郭阳,汪东,扈啸,孙书为.“银河飞腾”高性能数字信号处理器研究进展[J].计算机研究与发展,2006,43(6):993-1000. 被引量:29
  • 2Xie Yuan,Wolf W,Lekatsas I-I.Code compression for embedded VLIW processors using variable-to-fixed coding[J].IEEE Transactions on Very Large Integration(VLSI) System, 2006,14(5 ) : 525-536.
  • 3Agarwala S,Anderson T,Hill A,et al.A 600-MHz VLIW DSP[J]. IEEE Solid-State Circuits,2002,37(11):1532-1544.
  • 4Texas Instrument Incorporated. Very Long Instruction Word Mioroprocessor with Execution Packet Spanning Two or More Fetch Packets with Pre-dispatch Instruction Selection from Two Latches According to Instruction Bit[P]. US :7039790, 2000.10.31.
  • 5Knijnenburg P M W. Branch Classification to Control Instruction Fetch in Simultaneous Multithrcaded Architectures[ C]//Proc of the Int 'l Workshop on Innovative Architecture for Future Generation High-Performance Procssors and Systems, 2002:67 - 76.
  • 6Chiu I C. A Novel Instruction Stream Buffer for VLIW Architectures[J]. Computers & Electrical Engineering,2010,36(1):190-198.
  • 7Jayapala M. Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors [ J ]. IEEE Trans. on Computers, 2005,54(6) :672 - 683.
  • 8Rivers J A. Reducing Instruction Fetch Energy with Backwards Branch Control Information and Buffering[ C]//Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003,8:25 - 27.
  • 9Datta K, Murphy M, Volkov V. Stencil Computation Optimization and Auto-tuning on State-if-the-Art Multicore Architectures[ C]// High Performance Computing, Networking, Storage and Analysis, 2008:1 - 12.
  • 10Hu Y L, Wang Y M. Task Scheduling and Management in Single- Chip Multi-Mrocessor System [C]//International Conference on Electronic Packaging Technology and High Density Packaging, 2008(7) : 1 - 4.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部