Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc...Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.展开更多
Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual...Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.展开更多
Content Addressable Memory (CAM) is a type of memory used for high-speed search applications. Due to parallel comparison feature, the CAM memory leads to large power consumption which is caused by frequent pre-charge ...Content Addressable Memory (CAM) is a type of memory used for high-speed search applications. Due to parallel comparison feature, the CAM memory leads to large power consumption which is caused by frequent pre-charge or discharge of match line. In this paper, CAM for automatic charge balancing with self-control mechanism is proposed to control the voltage swing of ML for reducing the power consumption of CAM. Another technique to reduce the power dissipation is to use MSML, it combines the master-slave architecture with charge minimization technique. Unlike the conventional design, only one match line (ML) is used, whereas in Master-Slave Match Line (MSML) one master ML and several slave MLs are used to reduce the power dissipation in CAM caused by match lines (MLs). Theoretically, the match line (ML) reduces the power consumption up to 50% which is independent of search and match case. The simulation results using Cadence tool of MSML show the reduced power consumption in CAM and modified CAM cell.展开更多
文摘Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.
文摘Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.
文摘Content Addressable Memory (CAM) is a type of memory used for high-speed search applications. Due to parallel comparison feature, the CAM memory leads to large power consumption which is caused by frequent pre-charge or discharge of match line. In this paper, CAM for automatic charge balancing with self-control mechanism is proposed to control the voltage swing of ML for reducing the power consumption of CAM. Another technique to reduce the power dissipation is to use MSML, it combines the master-slave architecture with charge minimization technique. Unlike the conventional design, only one match line (ML) is used, whereas in Master-Slave Match Line (MSML) one master ML and several slave MLs are used to reduce the power dissipation in CAM caused by match lines (MLs). Theoretically, the match line (ML) reduces the power consumption up to 50% which is independent of search and match case. The simulation results using Cadence tool of MSML show the reduced power consumption in CAM and modified CAM cell.