DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new pageallocation-based optimization that works seamlessly together with some existing hardware and software optimizat...DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new pageallocation-based optimization that works seamlessly together with some existing hardware and software optimizations to eliminate significantly more row buffer conflicts. Validation in simulation using a set of selected scientific and engineering benchmarks against a few representative memory controller optimizations shows that our method can reduce row buffer miss rates by up to 76% (with an average of 37.4%). This reduction in row buffer miss rates will be translated into performance speedups by up to 15% (with an average of 5%).展开更多
DRAM-based memory suffers from increasing row buffer conflicts,which causes significant performance degradation and power consumption.As memory capacity increases,the overheads of the row buffer conflict are increasin...DRAM-based memory suffers from increasing row buffer conflicts,which causes significant performance degradation and power consumption.As memory capacity increases,the overheads of the row buffer conflict are increasingly worse as increasing bitline length,which results in high row activation and precharge latencies.In this work,we propose a practical approach called Row Buffer Cache(RBC)to mitigate row buffer conflict overheads efficiently.At the core of our proposed RBC architecture,the rows with good spatial locality are cached and protected,which are exempted from being interrupted by the accesses for rows with poor locality.Such an RBC architecture significantly reduces the overheads of performance and energy caused by row activation and precharge,and thus improves overall system performance and energy efficiency.We evaluate RBC architecture using SPEC CPU2006 on a DDR4 memory compared to a commodity baseline memory system.Results show that RBC improves the overall performance by up to 2:24(16:1%on average)and reduces the memory energy by up to 68:2%(23:6%on average)for single-core simulations.For multi-core simulations,RBC increases the overall performance by up to1:55(17%on average)and reduces memory energy consumption by up to 35:4%(21:3%on average).展开更多
基金Supported by the National Basic Research 973 Program of China under Grant No. 2005CB321602the National Natural Science Foundation of China under Grant No. 60736012
文摘DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new pageallocation-based optimization that works seamlessly together with some existing hardware and software optimizations to eliminate significantly more row buffer conflicts. Validation in simulation using a set of selected scientific and engineering benchmarks against a few representative memory controller optimizations shows that our method can reduce row buffer miss rates by up to 76% (with an average of 37.4%). This reduction in row buffer miss rates will be translated into performance speedups by up to 15% (with an average of 5%).
基金supported by the US National Science Foundation(Nos.CCF-1717660 and CNS-1828363)。
文摘DRAM-based memory suffers from increasing row buffer conflicts,which causes significant performance degradation and power consumption.As memory capacity increases,the overheads of the row buffer conflict are increasingly worse as increasing bitline length,which results in high row activation and precharge latencies.In this work,we propose a practical approach called Row Buffer Cache(RBC)to mitigate row buffer conflict overheads efficiently.At the core of our proposed RBC architecture,the rows with good spatial locality are cached and protected,which are exempted from being interrupted by the accesses for rows with poor locality.Such an RBC architecture significantly reduces the overheads of performance and energy caused by row activation and precharge,and thus improves overall system performance and energy efficiency.We evaluate RBC architecture using SPEC CPU2006 on a DDR4 memory compared to a commodity baseline memory system.Results show that RBC improves the overall performance by up to 2:24(16:1%on average)and reduces the memory energy by up to 68:2%(23:6%on average)for single-core simulations.For multi-core simulations,RBC increases the overall performance by up to1:55(17%on average)and reduces memory energy consumption by up to 35:4%(21:3%on average).