期刊文献+

片上多处理器中延迟和容量权衡的cache结构 被引量:3

The Tradeoff Cache Between Latency and Capacity in Chip Multiprocessors
下载PDF
导出
摘要 片上多处理器中二级cache的设计面临着延迟和容量不能同时满足的矛盾,私有结构有较小的命中延迟但是减少了cache的有效容量,共享结构能增加cache的有效容量但是有较长的命中延迟.提出了一种适用于CMP的cache结构——延迟和容量权衡的cache结构(TCLC).该结构是一种混合私有结构和共享结构的设计,核心思想是动态识别cache块的共享类型,根据不同共享类型分别对其进行优化,对私有cache块采用迁移的优化策略,对共享只读cache块采用复制的优化策略,对共享读写cache块采用中心放置的优化策略,以期达到访问延迟接近私有结构,有效容量接近共享结构的目的,从而缓解线延迟的影响,减少平均内存访问延迟.全系统模拟的实验结果表明,采用TCLC结构,相对于私有结构性能平均提高13.7%,相对于共享结构性能平均提高12%. Chip multiprocessors (CMP) have become the main stream microprocessor architecture, in CMP, the cache, especially the last level cache, is the critical part of its performance and becomes a focus of current research activities. CMP cache faces the conflicting requirements of satisfying both latency and capacity, and has to trade off between techniques that reduce off-chip and cross-chip misses. The private cache design minimizes the cache access latency but reduces the total effective cache capacity. The shared cache design maximizes the effective cache capacity but incurs long hit latency. In this paper, a CMP cache design (tradeoff cache between latency and capacity,TCLC) is proposed. TCLC is a private and shared hybrid design. TCLC can dynamically identify the cache blocks' shared type and optimize them respectively. The private type is optimized through migration policy, the shared read-only type is optimized through replication policy, and the shared read-write type is optimized through center placement policy. TCLC tries to make cache access latency close to private design, and effective cache capacity close to shared design, which can mitigate the impact of the wire delay and reduce the average memory access latency. The experiment results indicate that this proposal performs 13.7% better than a private cache and 12% better than a shared cache.
出处 《计算机研究与发展》 EI CSCD 北大核心 2009年第1期167-175,共9页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60673146,60603049,60703017,60736012) 国家“八六三”高技术研究发展计划基金项目(2006AA010201,2007AA01Z114) 国家“九七三”重点基础研究发展规划基金项目(2005CB321600)~~
关键词 片上多处理器 TCLC 二级CACHE 复制 迁移 中心放置 chip multiprocessors TCLC L2 cache replication migration center placement
  • 相关文献

参考文献14

  • 1International Technology Roadmap for Semiconductors. ITRS2007 Edition [OL]. [2007-10-05]. http://publie.itrs. net/. 2007
  • 2McNairy C, Bhatia R. Montecito: A dual core dual thread Itanium processor [J]. IEEE Micro, 2005, 25(2): 10-20
  • 3Kongetira P, Aingaran K, Olukotun K. Niagara: A 32-way multithreaded spare processor [J]. IEEE Micro, 2005, 25 (2) : 21-29
  • 4Sinharoy B. Kalla R, Tendler J, et al. Power5 system microarchitecture [J]. IBM Journal of Research and Development, 2005, 49(4): 505-521
  • 5Beckmann B, Wood D A. Managing wire delay in large chipmultiprocessor caches [C] //Proc of the 37th Annual IEEE/ ACM Int Sym on Microarchitecture. Los Alamitos: IEEE Computer Society, 2004:319-330
  • 6Jaleel A, Mattina M, Jacob B. Last level cache (LLC) performance of data mining workloads on a CMP-A case study of parallel bioinformatics workloads [C] //Proc of the 12th Int Sym on High Performance Computer Architecture. Los Alamitos: IEEE Computer Society, 2006:88-98
  • 7Beckmann B, Marry M R, Wood D A. ASR: Adaptive selective replication for CMP caches [C] //Proc of the 39th Annual Int Sym on Microarehitecture.Los Alamitos: IEEE Computer Society, 2006:435-454
  • 8Zhang M, Asanovic K. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors [C]//Proc of the 32nd Annual Int Sym on Computer Architecture. Los Alamitos: IEEE Computer Society, 2005:336-345
  • 9Kim C, Burger D, Keekler S W. An adaptive, Non-uniform cache structure for wire-dominated on-chip caches [C] //Proc of ASPLOS-10. New York: ACM , 2002:211-222
  • 10胡伟武,张福新,李祖松.龙芯2号处理器设计和性能分析[J].计算机研究与发展,2006,43(6):959-966. 被引量:37

二级参考文献11

  • 1MIPS Ⅳ instruction set. http://www.mips.com, 1995
  • 2Divid Patterson, John Hennessy. Computer A rchitecture: AQuantitative Approach. San Francisco: Morgan Kaufmann, 1996
  • 3R. Kessler. The Alpha 21264 microprocessor, IEEE Micro,1999, 19(2): 24-36
  • 4Kenneth Yeager. The MIPS R10000 superscalar microprocessor.IEEE Micro, 1996, 16(3): 28-41
  • 5Tim Horel, Gary Lauterbach. UntraSparc-Ⅲ: Designing third-generation 64-bit performance. IEEE Micro, 1999, 19 (3) : 73-85
  • 6Ashok Kumar, The HP PA 8000 RISC CPU. IEEE Micro,1997, 17(2): 27-32
  • 7J. Tendler, S. Dodson, S. Fields, et al. Power 4 system microarchitecture. New York: IBM Corporation, 2001
  • 8G. Hinton, D, Sager, M. Upton, et al. The microarchitecture of the Pentium Ⅳ processor. Intel Technology Journal, 2001.http://www.intel. com/technology/iti/q12001/pdf/art_2.pdf
  • 9J. Huck, et al.Introducing the IA-64 architecture. IEEE Micro,2000, 20(5): 12-23
  • 10J. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 2000, 33(7): 28-35

共引文献36

同被引文献63

  • 1Wulf A, McKee A. Hitting the memory wall: Implications of the obvious [J]. Computer Architecture News, 1995, 23 (1): 14-24.
  • 2Hofstee P. Power efficient processor architecture and the cell processor [C]//Proc of the 11th Int Symp on High Performance Computer Architecture ( HPCA'05 ). Los Alamitos: IEEE Computer Society, 2005:258-262.
  • 3Sankaralingam K, Nagarajan R, Liu H, et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture [C]//Proc of the 30th Int Symp on Computer Architecture ( ISCA'03 ). Los Alamitos: IEEE Computer Society, 2003:422-433.
  • 4Leverich J, Arakida H, Solomatnikov A, et al. Comparing memory systems for chip multiprocessors [C]//Proc of the 34th Int Symp on Computer Architecture (ISCA'07). Los Alamitos: IEEE Computer Society, 2007:358-368.
  • 5Kyle J N, Aggarwal N, Laudon J, et al. Fair queuing memory systems [C] //Proc of the 39th Int Syrup on Microarchitecture(Micro'06). Los Alamitos: IEEE Computer Society, 2006:208-222.
  • 6Loh H. 3D-stacked memory architectures for multi-core processors [C] //Proc of the 35th Int Symp on Computer Architecture (ISCA'08). Los Alamitos: IEEE Computer Society, 2008:1-14.
  • 7Rixnerl S, Daily J, Kapasi J, et al. Memory access scheduling [C]//Proc of the 27th Int Symp on Computer Architccturc(ISCA'00). Los Alamitos: IEEE Computcr Socicty, 2000:128-138.
  • 8Hong I, Mckee A, Salinas H, et al. Access order and effective bandwidth for streams on a direct Rambus memory [C] //Proc of the 5th Int Symp on High Performance Computer Architecture (HPCA'99). Los Alamitos: IEEE Computer Society, 1999:80-89.
  • 9Hur I, Lin C. Adaptive history-based memory schedulers [C]//Proc of the 37th Int Symp on Microarchitecture (Micro'04). Los Alamitos: IEEE Computer Society, 2004: 243-254.
  • 10Plbrahim Hur, PCalvin Lin. Memory prefetching using adaptive stream detection [C] //Proc of the 39th Int Symp on Microarchitecture (Micro'06). Los Alamitos: IEEE Computer Society, 2006:397-408.

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部