摘要
同时多线程(SMT,Simultaneous Multithreading)处理器通过每个周期同时运行来自多个线程的指令来提高性能.同时执行的线程在共享资源的同时也在竞争资源.如果一个发生L2 cache失效的线程长时间占用共享资源,那么会导致其他线程运行速度减慢,甚至会因为缺少资源而停顿下来,从而降低了SMT处理器的总体性能.为了减小L2 cache失效给SMT处理器性能带来的负面影响,许多取指策略被提了出来,DWarn就是其中比较有效的一种.本文在DWarn的基础上进行改进,提出了DWarn+取指策略.模拟结果表明,当同时运行的线程数目不超过4时,无论使用IPC作为度量标准还是使用Hmean作为度量标准,DWarn+都要明显优于DWarn;当同时运行的线程数目大于4时,DWarn+相对于DWarn的提高主要体现在存储器访问密集的工作负载上,而对于所有类型工作负载,DWarn+相对于DWarn的平均提高非常有限.
Simultaneous Multithreading (SMT) processors improve performance by allowing running instructions from several threads simultaneously at a single cycle. These threads executing simultaneously share the processor's resources, but at the same time compete for them. A thread missing in L2 cache may occupy most of available resources for a long time, causing other threads run slower than they could or even stall because of lack of resources. As a result, the overall performance of SMT processors is degraded. To prevent this situation, many instruction fetch policies are proposed. DWarn is among the most efficient fetch policies to handle L2 cache misses. In this paper, an enhanced version of the DWarn policy called DWarn+ is presented. Results show that our policy significantly improves the original one when not more than four threads run, whether using IPC as a metric or using Hmean as a metric. When the number of threads running is higher than 4, DWarn+ enhances the original one mainly for memory bounded workloads, and the average improvement for all types of workloads is very limited.
出处
《小型微型计算机系统》
CSCD
北大核心
2007年第9期1720-1723,共4页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60376018)资助