一种提高同时多线程VLIW处理器中取指单元吞吐率的方法被引量：2

A Method to Improve the Throughput of the Instruction Fetch Unit in SMT VLIW Processors

下载PDF

导出

摘要在同时多线程处理器中,提高取指单元的吞吐率意味着各线程之间的Cache竞争更加激烈,而这种竞争又制约着取指单元吞吐率的提高。本文针对当前超长指令字体系结构的新特点,提出了一种同时提高取指单元和处理器吞吐率的方法。该方法通过尽可能早地作废取指流水线中的无效地址,减少了由无效取指导致的程序Cache冲突,也提高了整个处理器的性能。实验结果表明,该方法使处理器和取指单元的吞吐率均相对提高了12%～23%,而一级程序Cache的失效率则略微增加甚至降低。另外,它还能够减少10%～25%的一级程序Cache读访问,从而降低了处理器的功耗。 In a simuhaneous multithreaded processor, improving the throughput of the instruction fetch unit usually means that there is more drastic cache competition between threads, but this competition limits the throughput reversely. Based on the characteristics of the current VLIW architectures,this paper presents an instruction fetch scheme that improves the throughput of the fetch unit and the whole processor. By canceling the invalid addresses in the instruction fetching pipeline, it decreases those conflicts of program caches caused by invalid instruction fetch. As the experimental results show, this scheme can improve the throughput of the instruction unit and the performance of the whole processor by 12～23% relatively,while the program cache＇s miss rate increases appreciably, even decreases sometimes. It also reduces the program cache＇s accesses by 10%～25%, so the power consumption of the whole processor is decreased.

作者万江华陈书明

机构地区国防科技大学计算机学院

出处《计算机工程与科学》 CSCD 2007年第6期97-101,共5页 Computer Engineering & Science

基金国家863计划资助项目(2004AA1Z1040) 国家自然科学基金资助项目(60473079)

关键词同时多线程超长指令字 cache冲突取指无效地址 SMT VLIW cache conflict instruction fetch invalid address

分类号 TP363 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献10

1Tullsen D M,Eggers S J,Emer J S,et al.Exploiting Choice:Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor[A].Pro of the 23rd Annual Int'l Symp on Computer Architecture[C].1996.191-202.
2Tullsen D M,Lo L J,Eggers S J,et al.Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor[A].Proc of the 5th Int'l Symp on High Performance Computer Architecture[C].1999.54-58.
3Suh C G,Devadas S,Rudolph L.A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning[A].Pro of the 8th Int'l Symp on High-Performance Computer Architecture[C].2002.117-118.
4Knijnenburg P M W.Branch Classification to Control Instruction Fetch in Simultaneous Multithreaded Architectures[A].Proc of the Int'l Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems[C].2002.67-76.
5Lo J L,Eggers S J,Levy H M,et al.Tuning Compiler Optimizations for Simultaneous Multithreading[A].Proc of the 30th Int'l Symp on Microarchitecture[C].1997.114-124
6Kumar R,Tullsen D.Compiling for Instruction Cache Performance on a Multithreaded Architecture[A].Proc of the 35th Int'l Symp on Microarchitecture[C].2002.419-429.
7Lo J L,Barroso L A,Eggers S J,et al.Parekh An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors[A].Proc of the 25th Annual Int'l Symp on Computer Architecture[C].1998.39-50.
8TMS320C6000 CPU and Instruction Set Reference Guide[M].Texas Instruments Incorporated,1998.
9http://www.cs.wisc.edu/～markhill/DineroIV/,2005-10.
10Lee C.UTDSP Benchmark Suite[Z].University of Toronto,1992.

同被引文献19

1陈书明,李振涛,万江华,胡定磊,郭阳,汪东,扈啸,孙书为.“银河飞腾”高性能数字信号处理器研究进展[J].计算机研究与发展,2006,43(6):993-1000. 被引量：29
2Xie Yuan,Wolf W,Lekatsas I-I.Code compression for embedded VLIW processors using variable-to-fixed coding[J].IEEE Transactions on Very Large Integration(VLSI) System, 2006,14(5 ) : 525-536.
3Agarwala S,Anderson T,Hill A,et al.A 600-MHz VLIW DSP[J]. IEEE Solid-State Circuits,2002,37(11):1532-1544.
4Texas Instrument Incorporated. Very Long Instruction Word Mioroprocessor with Execution Packet Spanning Two or More Fetch Packets with Pre-dispatch Instruction Selection from Two Latches According to Instruction Bit[P]. US :7039790, 2000.10.31.
5Knijnenburg P M W. Branch Classification to Control Instruction Fetch in Simultaneous Multithrcaded Architectures[ C]//Proc of the Int 'l Workshop on Innovative Architecture for Future Generation High-Performance Procssors and Systems, 2002:67 - 76.
6Chiu I C. A Novel Instruction Stream Buffer for VLIW Architectures[J]. Computers & Electrical Engineering,2010,36(1):190-198.
7Jayapala M. Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors [ J ]. IEEE Trans. on Computers, 2005,54(6) :672 - 683.
8Rivers J A. Reducing Instruction Fetch Energy with Backwards Branch Control Information and Buffering[ C]//Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003,8:25 - 27.
9Datta K, Murphy M, Volkov V. Stencil Computation Optimization and Auto-tuning on State-if-the-Art Multicore Architectures[ C]// High Performance Computing, Networking, Storage and Analysis, 2008:1 - 12.
10Hu Y L, Wang Y M. Task Scheduling and Management in Single- Chip Multi-Mrocessor System [C]//International Conference on Electronic Packaging Technology and High Density Packaging, 2008(7) : 1 - 4.

引证文献2

1郭阳,甄体智,李勇.YHFT-DX高性能DSP指令控制流水线设计与优化[J].计算机工程与应用,2010,46(7):69-71. 被引量：1
2杨惠,陈书明,万江华.一种基于VLIW DSP架构的高性能取指流水线[J].国防科技大学学报,2011,33(4):102-106. 被引量：1

二级引证文献2

1姬忠宁,陈迅,徐金甫,张鹏.基于指令前缀的专用VLIW压缩技术研究与实现[J].电子技术应用,2013,39(4):22-25. 被引量：2
2杨惠,陈书明.一种基于VLIW结构的高性能变长指令发射机制[J].计算机研究与发展,2013,50(10):2239-2246. 被引量：1

1山村野菇.收藏夹里的网址还有效吗[J].电脑爱好者,2006,0(18):54-54.
2唐骞,杨小雪.VLIW处理器的设计与实现[J].微型机与应用,2010,29(11):27-29. 被引量：1
3杨磊,张铁军,王东辉.面向嵌入式VLIW处理器的代码压缩技术[J].微计算机应用,2010,31(5):59-62.
4吴国伟,姚琳.一种嵌入式软件WCET估计新方法[J].大连理工大学学报,2004,44(6):912-915. 被引量：4
5区文.超标量处理器与VLIW处理器的性能比较[J].电子计算机,1995(2):22-34.
6刘权胜,杨洪斌,吴悦.同时多线程技术[J].计算机工程与设计,2008,29(4):963-967. 被引量：8
7舒辉,杨磊,王丽华.CME指导下的PADDING算法[J].信息工程大学学报,2002,3(4):35-39.
8李祖松,许先超,胡伟武,唐志敏.龙芯2号同时多线程处理器的软硬件接口设计[J].软件学报,2007,18(7):1806-1817. 被引量：2
9董磊,郭锐锋,秦承刚.面向多核处理器的共享Cache冲突预测模型[J].小型微型计算机系统,2013,34(8):1930-1934.
10李笑天,郭德源,何虎.分支预测与值预测在VLIW处理器中的实现[J].微电子学与计算机,2015,32(1):54-59. 被引量：1

计算机工程与科学

2007年第6期

浏览历史

内容加载中请稍等...

一种提高同时多线程VLIW处理器中取指单元吞吐率的方法被引量：2

参考文献10

同被引文献19

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种提高同时多线程VLIW处理器中取指单元吞吐率的方法 被引量：2

参考文献10

同被引文献19

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种提高同时多线程VLIW处理器中取指单元吞吐率的方法被引量：2