期刊文献+

BACH: A Bandwidth-Aware Hybrid Cache Hierarchy Design with Nonvolatile Memories

BACH: A Bandwidth-Aware Hybrid Cache Hierarchy Design with Nonvolatile Memories
原文传递
导出
摘要 Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor (CMP) design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this paper, we address this problem by proposing a memory, a bandwidth-aware reconfigurable cache hierarchy, BACH, with hybrid memory technologies. Components of our BACH design include a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies with various bandwidth characteristics, such as spin-transfer torque memory (STT-MRAM), resistive memory (ReRAM), and embedded DRAM (eDRAM), to configure each level so that the peak bandwidth of the overall cache hierarchy is optimized. Our reconfiguration mechanism can dynamically adjust the cache capacity of each level based on the predicted bandwidth demands of running workloads. The bandwidth prediction is performed by our prediction engine. We evaluate the system performance gain obtained by BACH design with a set of multithreaded and multiprogrammed workloads with and without the limitation of system power budget. Compared with traditional SRAM-based cache design, BACH improves the system throughput by 58% and 14% with multithreaded and multiprogrammed workloads respectively. Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor (CMP) design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this paper, we address this problem by proposing a memory, a bandwidth-aware reconfigurable cache hierarchy, BACH, with hybrid memory technologies. Components of our BACH design include a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies with various bandwidth characteristics, such as spin-transfer torque memory (STT-MRAM), resistive memory (ReRAM), and embedded DRAM (eDRAM), to configure each level so that the peak bandwidth of the overall cache hierarchy is optimized. Our reconfiguration mechanism can dynamically adjust the cache capacity of each level based on the predicted bandwidth demands of running workloads. The bandwidth prediction is performed by our prediction engine. We evaluate the system performance gain obtained by BACH design with a set of multithreaded and multiprogrammed workloads with and without the limitation of system power budget. Compared with traditional SRAM-based cache design, BACH improves the system throughput by 58% and 14% with multithreaded and multiprogrammed workloads respectively.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第1期20-35,共16页 计算机科学技术学报(英文版)
关键词 memory bandwidth hybrid cache reconfigurable cache nonvolatile memory memory bandwidth, hybrid cache, reconfigurable cache, nonvolatile memory
  • 相关文献

参考文献60

  • 1McKee S A. Reflections on the memory wall. In Proc. the 1st Conference on Computing Frontiers, April 2004, p.162.
  • 2Burger D, Goodman J R, K~gi A. Memory bandwidth lim- itations of future microprocessors. In Proc. the 23rd Inter- national Symposium on Computer Architecture, May 1996, pp.78-89.
  • 3Rogers B M, Krishna A, Bell G Bet al. Scaling the band- width wall: Challenges in and avenues for CMP scaling. In Proc. the 36th International Symposium on Computer Architecture, June 2009, pp.371-382.
  • 4Huh J, Burger D, Keckler S W. Exploring the design space of future CMPs. In Proc. the International Conference on Parallel Architectures and Compilation Techniques, Sept. 2001, pp.199-210.
  • 5Lindholm E, Nickolls J, Oberman S, Montrym J. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 2008, 28(2): 39-55.
  • 6Sun G, Wu X, Xie Y. Exploration of 3D stacked L2 cache design for high performance and efficiei~t thermal control. In Proc. the International Symposium on Low Power Elec- tronics and Design, Aug. 2009, pp.295-298.
  • 7Sun G, Dong X, Xie Y, Li J, Chen Y. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proc. the 15th International Conference on High Performance Com- puter Architecture, Feb. 2009, pp.239-249.
  • 8Yu C, Petrov P. Off-chip memory bandwidth minimiza- tion through cache partitioning for multi-core platforms. In Proc. the 47th Design Automation Conference, June 2010, pp.132-137.
  • 9Sun G, Hughes C, Kim C, Zhao J, Xu C, Xie Y, Chen Y K. Moguls: A model to explore memory hierarchy for throughput computing. In Proc. the 38th ISCA, June 2011, pp.377-388.
  • 10Hosomi M, Yamagishi H, Yamamoto T et al. A novel non- volatile memory with spin torque transfer magnetization switching: Spin-RAM. In Proc. IEEE International Elec- tron Devices Meeting, IEDM Technical Digest, Dec. 2005, pp.459-462.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部