期刊文献+

新兴多核工作负载访存行为的定量分析

Quantitative analysis of the emerging multi-core workload memory behavior
原文传递
导出
摘要 工作负载分析是片上多处理器末级缓存设计的关键先导工作。分析了一组访存密集型多线程RMS(recognition-mining-synthesis)工作负载工作集大小、数据共享行为和空间局部性等访存行为,研究了末级缓存的设计空间,探讨了未来片上多处理器的缓存体系结构设计。实验结果表明:大容量DRAM缓存有助于满足这组负载的大工作集对缓存容量的需求,使用128MB DRAM缓存比不使用时平均可以减少18%的L1缓存缺失延迟;共享缓存设计比私有设计性能更好,8MB的共享缓存可以比相同总容量的私有缓存提高25%的缓存性能;基于步长的硬件数据预取机制可以提高25%的性能。因此,对于访存密集型RMS负载,宜采用一个128MB的DRAM缓存、一个8MB片上SRAM缓存,结合一个8表项的流式预取器,构成缓存子系统。 Workload characterization is a key leading job for the design of last-level caches (LLCs) on multi core processors. This paper analyzes the memory behavior of emerging RMS (recognition, mining, and synthesis) workloads for future multl-core processors, including the working set sizes, data sharing behavior, and spatial data locality, which shows that these RMS workloads are memory intensive, with large working set sizes, a significant amount of data sharing, and strong strided access patterns. The LLC design space was then explored for multi-threaded RMS workloads and the potential architectural choices were discussed for future multi-core cache design based on the observations. The experimental results show that large DRAM caches can effectively satisfy the cache requirement caused by large working sets with a 128 MB DRAM cache significantly reducing the average L1 miss penalty by 18% ; that the shared cache provides better performance than the private cache at the LLC level with a 8 MB shared cache improving the cache performance by 25% compared with a private cache with the same size in total; and that stride based hardware prefetehing mechanism provides significant performance improvement by 25 %. Consequently, a memory hierarchy is given with a 128 MB DRAM cache, an 8 MB on die SRAM shared cache, and an 8-entry stride prefetcher for the RMS workloads.
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2011年第8期1055-1062,1071,共9页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金资助项目(60573100 60773149) 国家"八六三"高技术项目(2008AA01Z108) 国家"九七三"重点基础研究项目(2007CB310900)
关键词 片上多处理器 片上缓存 负载分析 访存性能 RMS负载 chip multiprocessor on-chip cache workload characterization memory performance RMS workload
  • 相关文献

参考文献13

  • 1Duhey P. Recognition, mining and synthesis moves computers to the era of tera [J].Technology@Intel Magzine, 2005, 1: 1-10.
  • 2Chen Y K, Hughes C, Lee V. Convergence of recognition, mining, and synthesis workloads and its implications [J]. Proceedings o f the IEEE, 2008, 96(5): 790-807.
  • 3Chen Y, Li Q, Li W, et al. Media mining--Emerging tera-seale computing applications [J]. Intel Technology Journal, 2007, 11(3): 239-250.
  • 4Bienia C, Kumar S, Singh P J, et al. The PARSEC benchmark suite : Characterization and architectural implications [C]// Proceedings 17th International Conference on Parallel Architectures and Compilation Techniques. Toronto, 2008= 72 - 81.
  • 5Hughes C, Grzeszczuk R, Sifakis E, et al. Physical simulation for animation and visual effeets: Parallellzation and characterization for chip multiprocessors [C]// Proceedings of the 34th International Symposium on Computer Architecture. San Diego, 2007: 220-231.
  • 6Hurley J. Ray tracing goes mainstream [J]. Intel Technology Journal, 2005, 9: 99-108.
  • 7Chen Y, Diao Q, Dulong C, et al. Performance scalability of data-mining workloads in bioinformatics [J].Intel Technology Journal, 2005, 9(2) : 131 - 142.
  • 8Luk C, Cohn R, Muth R, et al. Pin= Building customized program analysis tools with dynamic instrumentation [C]// Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Chicago, 2005:190 200.
  • 9Jaleel A, Cohn R, Luk C, et al. CMPSim: A binary instrumentation approach to modeling memory behavior of workloads on CMPs [R]. Technical Report-UMDSCA 2006-01, 2006.
  • 10Muralimanohar N, Balasubramonian R, Jouppi N. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0 [C]// Proceedings of 40th International Symposium on Microarchitecture. Chicago, 2,007:3 - 14.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部