期刊文献+

DWarn+:一种改进的同时多线程处理器取指策略 被引量:3

DWarn+:An Enhanced Fetch Policy for SMT Processors
下载PDF
导出
摘要 同时多线程(SMT,Simultaneous Multithreading)处理器通过每个周期同时运行来自多个线程的指令来提高性能.同时执行的线程在共享资源的同时也在竞争资源.如果一个发生L2 cache失效的线程长时间占用共享资源,那么会导致其他线程运行速度减慢,甚至会因为缺少资源而停顿下来,从而降低了SMT处理器的总体性能.为了减小L2 cache失效给SMT处理器性能带来的负面影响,许多取指策略被提了出来,DWarn就是其中比较有效的一种.本文在DWarn的基础上进行改进,提出了DWarn+取指策略.模拟结果表明,当同时运行的线程数目不超过4时,无论使用IPC作为度量标准还是使用Hmean作为度量标准,DWarn+都要明显优于DWarn;当同时运行的线程数目大于4时,DWarn+相对于DWarn的提高主要体现在存储器访问密集的工作负载上,而对于所有类型工作负载,DWarn+相对于DWarn的平均提高非常有限. Simultaneous Multithreading (SMT) processors improve performance by allowing running instructions from several threads simultaneously at a single cycle. These threads executing simultaneously share the processor's resources, but at the same time compete for them. A thread missing in L2 cache may occupy most of available resources for a long time, causing other threads run slower than they could or even stall because of lack of resources. As a result, the overall performance of SMT processors is degraded. To prevent this situation, many instruction fetch policies are proposed. DWarn is among the most efficient fetch policies to handle L2 cache misses. In this paper, an enhanced version of the DWarn policy called DWarn+ is presented. Results show that our policy significantly improves the original one when not more than four threads run, whether using IPC as a metric or using Hmean as a metric. When the number of threads running is higher than 4, DWarn+ enhances the original one mainly for memory bounded workloads, and the average improvement for all types of workloads is very limited.
出处 《小型微型计算机系统》 CSCD 北大核心 2007年第9期1720-1723,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60376018)资助
关键词 同时多线程 二级cache失效 DWarn取指策略 资源分配 SMT L2 cache miss DWarn fetch policy resource allocation
  • 相关文献

参考文献10

  • 1Tullsen D,Eggers S,Levy H.Simultaneous multithreading:maximizing on-chip parallelism[C].In:Proceedings of the22nd Annual International Symposium on Computer Architecture,Santa Margherita Ligure,Italy,June 1995,392-403.
  • 2Tullsen D,Eggers S,Emer J,et al.Exploiting choice:instruction fetch and issue on an implementable simultaneous multithreading processor[C].In:Proceedings of the 23rd Annual International Symposium on Computer Architecture,PA,USA,May 1996,191-202.
  • 3Eggers S J,Emer J,Levy H M,et al.Simultaneous multithreading:a platform for next-generation processors[J].IEEE Micro,IEEE Computer Society Press,Sept.-Oct.1997,12-19.
  • 4Cazorla F J,Ramirez A,Valero M.DCache warn:an i-fetch policy to increase SMT efficiency[C].In:Proceedings of the 18th International Parallel and Distributed Processing Symposium,Santa Fe,New Mexico,April 2004,74-83.
  • 5Tullsen D,Brown J.Handling long-latency loads in a simultaneous multithreaded processor[C].In:Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture,Texas,USA,Dec.2001,318-327.
  • 6Cazorla F J,Ramirez A,Valero A,et al.Dynamically controlled resource allocation in SMT processors[C].In:Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture,Portland,Oregon,Dec.2004,171-182.
  • 7Tullsen D.Simulation and modeling of a simultaneous multithreading processor[C].In:Proceedings of the 22nd Annual Computer Measurement Group Conference,San Diego,CA,USA,Dec.1996,819-828.
  • 8The standard performance evaluation corporation[S/OL].WWW cite:http://www.specbench.org.
  • 9Sherwood T,Perelman E,Calder B.Basic block distribution analysis to find periodic behavior and simulation points in applications[C].In:Proceedings of the 10th Intl.Conference on Parallel Architectures and Compilation Techniques,Barcelona,Spain,Sept.2001,3-14.
  • 10Luo K,Gummaraju J,Franklin M.Balancing throughput and fairness in SMT processors[C].In:Proceedings of the Intl.Symposium on Performance Analysis of Systems and Software,Arizona,USA,Nov.2001,164-171.

同被引文献52

  • 1孙彩霞,张民选.基于多个取指优先级的同时多线程处理器取指策略[J].电子学报,2006,34(5):790-795. 被引量:3
  • 2何立强,刘志勇.一种具有QoS特性的同时多线程处理器取指策略[J].计算机研究与发展,2006,43(11):1980-1984. 被引量:4
  • 3张盛兵,王晶.同时多线程结构的线程预构[J].西北工业大学学报,2007,25(2):159-163. 被引量:2
  • 4Evers M, Yeh T-Y. Understanding Branches and Designing Branch Predictors for High-Performance Microprocessors[J]. Proceedings of the IEEE,2001,89(11) :1610-1620.
  • 5Kang D, Gaudiot J-L. Speculation-aware Thread Scheduling for Simultaneous Multithreading [ J]. IEE Electronics Letters, 2004,40(5) : 296-298.
  • 6Kang D, Gaudiot J-L. Speculation Control for Simultaneous Multithreading[C]// Proceedings of the 18th International Parallel and Distributed Processing Symposium. Santa Fe, New Mexico, April 2004 : 76-85.
  • 7Falcon A, Santana O J, Ramirez A, et al. Tolerating Branch Predictor Latency on SMT[C] //Proceedings of the 5^thInternational Symposium on High Performance Computing. Tokyo,Japan, October: 86-98.
  • 8Tullsen D M, Brown J A. Handling Long-latency Loads in a Simultaneous Multithreading Proeessor[C]// Proc. of the 34^th IEEE International Symposium on Mieroarchiteeture. Austin, USA,Dec 2001 : 318-327.
  • 9Ei-Moursy A, Albonesi D H. Front-End Policies for Improved Issue Efficiency in SMT Processors[C]//Proc. of the 9th International Symposium on High-Performance Computer Architecture. Anaheim,California, USA, February: 31-40.
  • 10Cazorla F J, Ramirez A, Valetor M, et al. Dcache Warn: an I- Fetch Policy to Increase SMT Efficiency[C]//Proc. of the 18^th International Symposium on Parallel and Distributed Processing. Santa Fe,New Mexico,USA, April 2004:74-83.

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部