一种缓解多线程访存干扰的VRB内存机制

Decoupling Contention with VRB Mechanism for Multi-Threaded Applications

下载PDF

导出

摘要目前处理器通过持续增加核数和同时执行的线程数来提高系统性能.但是,增加共享内存的处理器核数和线程数会使得存储器中的行缓存(row-buffer,RB)命中率下降,造成存储器访问功耗增加和访存延迟增加.设计并开发了一种细粒度的victim row-buffer(VRB)内存机制系统来解决此问题.VRB机制提供附加的行缓存(VRB),暂时缓存由于行缓存(RB)冲突而从行缓存(RB)逐出的数据,以备后续可能的访问.这种机制缓解了多线程冲突,增加了DRAM中行缓存数据的重用率,避免了不必要的内存数据阵列的访问、行激活和预充电、数据传输等电路动作,可以通过少量的硬件代价提高内存系统的性能,并节约系统的功耗消耗.通过时序精确的全系统模拟器实验,对比8核的Intel Xeon处理器,所提出的VRB机制可以达到最高17.6%(平均8.7%)的系统级吞吐率改善、最高142.9%(平均51.4%)的行缓存命中率改善以及最高17.6%(平均9.2%)的系统功耗改善. Currently,the processors improve system performance by increasing the number of cores and simultaneously running threads.However,increasing the number of processor cores and threads which share the memory system will decrease the memory row-buffer hit rate（RBHR）,causing more memory power consumption and longer memory access latencies.We design and develop a fine-grained victim row-buffer（VRB）memory system to solve this problem.VRB mechanism provides an additional row-buffer（VRB）which temporarily stores the expelled data due to the row-buffer（RB）conflict for a possible access in the near future.This mechanism mitigates the multi-threaded interference phenomenon and increases the reuse ratio of row-buffer data in DRAM and avoids unnecessary accesses of the array of cells,thus some row activations,precharge operations and data transmission activities can be reduced.VRB can improve system performance and power consumption while incurring minor hardware complexity.Through full-system cycle-accurate simulations of many threads applications,we demonstrate that VRB mechanism achieves an up to 17.6%（8.7% on average）system-level throughput improvement,an up to 142.9%（51.4% on average）RBHR improvement,and saves an up to 17.6%（9.2% on average）power consumption compared with an8-core Intel Xeon server.

作者高珂范东睿刘志勇

机构地区计算机体系结构国家重点实验室(中国科学院计算技术研究所) 中国科学院大学北京市移动计算和新型终端重点实验室(中国科学院计算技术研究所)

出处《计算机研究与发展》 EI CSCD 北大核心 2015年第11期2577-2588,共12页 Journal of Computer Research and Development

基金国家"九七三"重点基础研究发展计划基金项目(2011CB302501) 国家自然科学基金项目(61020106002 61221062) NSFC与香港RGC合作项目(61161160566) "核高基"国家科技重大专项基金项目(2013ZX0102-8001-001-001)

关键词 DRAM结构设计行缓存功耗消耗多线程 VRB机制 DRAM architecture design row buffer（RB） power consumption multi-threaded victim row-buffer（VRB）mechanism

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献32

1Udipi A N, Muralimanohar N, ChatterSee N, et al. Rethinking DRAM design and organization for energy- constrained multi-cores [C] //Proc of the AGM SIGARCH Computer Architecture News. New York= ACM, 2010: 175-186.
2Sudan K, Chatterjee N, Nellans D, et al. Micro pages: Increasing DRAM efficiency with locality-aware data placement [C] //Proc of the ACM SIGPLAN Notices. New York= ACM, 2010:219-230.
3Mi W, Feng X, Xue J, et al. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors [C] //Proc of the Network and Parallel Computing. Berlin= Springer, 2010:329-343.
4Liu L, Cui Z, King M, et al. A software memory partition approach for eliminating bank-level interference in multicore systems [C] /]Proc of the 21st Int Con{ on Parallel Architectures and Compilation Techniques. New York: ACM, 2012:367-376.
5Lin J, Lu Q, Ding X, et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems [C] //Proc of the High Performance Computer Architecture. Piseataway,NJ: IEEE, 2008:367-378.
6Qureshi M K, Patt Y N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches [C] //Proc of the 39th Annual 1EEE/ ACM Int Syrup on Microarchitecture. Piscataway, NJ: IEEE, 2006= 423-432.
7Yoon D H, Jeong M K, Sullivan M, et al. The dynamic granularity memory system [C] ]/Proc of the 39th Int Syrup on Computer Architecture. Piscataway, NJ: IEEE, 2012: 548-559.
8Zhang G, Wang H, Chen X, et al. Heterogeneous multi- channel: Fine grained DRAM control for both system performance and power efficiency [C] //Proc of the 49th Annual Design Automation Conf. New York= ACM, 2012= 876-881.
9Yoon D H, Jeong M K, Erez M. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput [J]. ACM SIGARCH Computer Architecture News, 2011, 39(3): 295-306.
10Mutlu O, Moscibroda T. Parallelism aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems [C] //Proe of the ACM SIGARCH Computer Architecture News. New York: ACM, 2008: 63- 74.

1徐雪松,章兢.嵌入式linux在工业控制领域中的应用[J].国外电子元器件,2004(3):62-65. 被引量：1
2屈俊峰,李颜峰,朱莉.C++中vector的内存机制[J].微计算机应用,2005,26(5):640-640. 被引量：1
3张玉珍,颜廷睿.DLL及其在Delphi中的应用[J].计算机时代,2005(1):45-46. 被引量：8
4Android内存机制有玄机[J].电脑爱好者,2012(24):45-45.
5赵效民.让我们谈谈RAID（五）[J].电脑高手,2001(3):59-59.
6黎肖仪.试论计算机网络安全机制与常用技术[J].中国科技纵横,2011(10):13-13.
7钟秀.阅读札记[J].电子产品世界,2004,11(12A):36-36.
8看看你的本本功耗究竟有多少?[J].电脑爱好者,2010(17):77-77.
9JEFF PROSISE,江洪湖.存储器:PC机的宝贵资源[J].个人电脑,1995,0(9):138-140.
10李学锋.Z-Buffer与W-Buffer的比较分析[J].福建电脑,2005,21(7):40-41.

计算机研究与发展

2015年第11期

浏览历史

内容加载中请稍等...

一种缓解多线程访存干扰的VRB内存机制

参考文献32

相关作者

相关机构

相关主题

浏览历史