BACH： A Bandwidth-Aware Hybrid Cache Hierarchy Design with Nonvolatile Memories

BACH： A Bandwidth-Aware Hybrid Cache Hierarchy Design with Nonvolatile Memories

导出

摘要 Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor （CMP） design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this paper, we address this problem by proposing a memory, a bandwidth-aware reconfigurable cache hierarchy, BACH, with hybrid memory technologies. Components of our BACH design include a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies with various bandwidth characteristics, such as spin-transfer torque memory （STT-MRAM）, resistive memory （ReRAM）, and embedded DRAM （eDRAM）, to configure each level so that the peak bandwidth of the overall cache hierarchy is optimized. Our reconfiguration mechanism can dynamically adjust the cache capacity of each level based on the predicted bandwidth demands of running workloads. The bandwidth prediction is performed by our prediction engine. We evaluate the system performance gain obtained by BACH design with a set of multithreaded and multiprogrammed workloads with and without the limitation of system power budget. Compared with traditional SRAM-based cache design, BACH improves the system throughput by 58% and 14% with multithreaded and multiprogrammed workloads respectively. Limited main memory bandwidth is becoming a fundamental performance bottleneck in chipmultiprocessor （CMP） design. Yet directly increasing the peak memory bandwidth can incur high cost and power consumption. In this paper, we address this problem by proposing a memory, a bandwidth-aware reconfigurable cache hierarchy, BACH, with hybrid memory technologies. Components of our BACH design include a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies with various bandwidth characteristics, such as spin-transfer torque memory （STT-MRAM）, resistive memory （ReRAM）, and embedded DRAM （eDRAM）, to configure each level so that the peak bandwidth of the overall cache hierarchy is optimized. Our reconfiguration mechanism can dynamically adjust the cache capacity of each level based on the predicted bandwidth demands of running workloads. The bandwidth prediction is performed by our prediction engine. We evaluate the system performance gain obtained by BACH design with a set of multithreaded and multiprogrammed workloads with and without the limitation of system power budget. Compared with traditional SRAM-based cache design, BACH improves the system throughput by 58% and 14% with multithreaded and multiprogrammed workloads respectively.

作者 Jishen Zhao Cong Xu Tao Zhang Yuan Xie

机构地区 Department of Computer Engineering Hewlet-Packard Labs NVIDIA Corporation Department of Electrical and Computer Engineering

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第1期20-35,共16页 计算机科学技术学报（英文版）

关键词 memory bandwidth hybrid cache reconfigurable cache nonvolatile memory memory bandwidth, hybrid cache, reconfigurable cache, nonvolatile memory

分类号 TP333 [自动化与计算机技术—计算机系统结构] TU318 [建筑科学—结构工程]

引文网络
相关文献

参考文献60

1McKee S A. Reflections on the memory wall. In Proc. the 1st Conference on Computing Frontiers, April 2004, p.162.
2Burger D, Goodman J R, K~gi A. Memory bandwidth lim- itations of future microprocessors. In Proc. the 23rd Inter- national Symposium on Computer Architecture, May 1996, pp.78-89.
3Rogers B M, Krishna A, Bell G Bet al. Scaling the band- width wall: Challenges in and avenues for CMP scaling. In Proc. the 36th International Symposium on Computer Architecture, June 2009, pp.371-382.
4Huh J, Burger D, Keckler S W. Exploring the design space of future CMPs. In Proc. the International Conference on Parallel Architectures and Compilation Techniques, Sept. 2001, pp.199-210.
5Lindholm E, Nickolls J, Oberman S, Montrym J. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 2008, 28(2): 39-55.
6Sun G, Wu X, Xie Y. Exploration of 3D stacked L2 cache design for high performance and efficiei~t thermal control. In Proc. the International Symposium on Low Power Elec- tronics and Design, Aug. 2009, pp.295-298.
7Sun G, Dong X, Xie Y, Li J, Chen Y. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proc. the 15th International Conference on High Performance Com- puter Architecture, Feb. 2009, pp.239-249.
8Yu C, Petrov P. Off-chip memory bandwidth minimiza- tion through cache partitioning for multi-core platforms. In Proc. the 47th Design Automation Conference, June 2010, pp.132-137.
9Sun G, Hughes C, Kim C, Zhao J, Xu C, Xie Y, Chen Y K. Moguls: A model to explore memory hierarchy for throughput computing. In Proc. the 38th ISCA, June 2011, pp.377-388.
10Hosomi M, Yamagishi H, Yamamoto T et al. A novel non- volatile memory with spin torque transfer magnetization switching: Spin-RAM. In Proc. IEEE International Elec- tron Devices Meeting, IEDM Technical Digest, Dec. 2005, pp.459-462.

1IBM制成32nm SOI嵌入式DRAM测试芯片[J].微型计算机,2009(30):13-13.
2陈权兵.嵌入式DRAM在网络处理器中的应用[J].海军工程大学电子工程学院学报,2002(2):73-74.
3易立华,邹雪城,刘振林.片上eDRAM性能评价函数簇研究[J].微计算机信息,2008,24(5):97-98.
4IBM正在研制每秒运算1.6亿万亿次的超级计算机[J].电子产品可靠性与环境试验,2009,27(6):22-22.
5张必超,蒋大文,于鹏.嵌入式DRAM的BIST测试方法的研究[J].中国测试技术,2005,31(1):69-71. 被引量：3
6纳米记忆技术[J].机械,2015,42(6):55-55.
7董丽凤.谁需要10G布线系统[J].中国计算机用户,2005(24):52-52.
8薛叶兴.手机操作系统UI设计的界面记忆技术[J].福建电脑,2013,29(9):149-150. 被引量：1
9IBM制成32nm SOI嵌入式DRAM[J].中国集成电路,2009,18(10):9-10.
10赵晓生.《“BACH解密”》节选(四)[J].乐府新声（沈阳音乐学院学报）,2008,26(4):15-24.

Journal of Computer Science & Technology

2016年第1期

浏览历史

内容加载中请稍等...

BACH： A Bandwidth-Aware Hybrid Cache Hierarchy Design with Nonvolatile Memories

参考文献60

相关作者

相关机构

相关主题

浏览历史