Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc...Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.展开更多
Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual...Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.展开更多
Content Addressable Memory (CAM) is a type of memory used for high-speed search applications. Due to parallel comparison feature, the CAM memory leads to large power consumption which is caused by frequent pre-charge ...Content Addressable Memory (CAM) is a type of memory used for high-speed search applications. Due to parallel comparison feature, the CAM memory leads to large power consumption which is caused by frequent pre-charge or discharge of match line. In this paper, CAM for automatic charge balancing with self-control mechanism is proposed to control the voltage swing of ML for reducing the power consumption of CAM. Another technique to reduce the power dissipation is to use MSML, it combines the master-slave architecture with charge minimization technique. Unlike the conventional design, only one match line (ML) is used, whereas in Master-Slave Match Line (MSML) one master ML and several slave MLs are used to reduce the power dissipation in CAM caused by match lines (MLs). Theoretically, the match line (ML) reduces the power consumption up to 50% which is independent of search and match case. The simulation results using Cadence tool of MSML show the reduced power consumption in CAM and modified CAM cell.展开更多
软件定义网络(Software Defined Networking,SDN)是一种革命性的网络架构,主要思想是将控制平面与数据平面分离,并且还拥有开放可编程特性。其对数据包转发以及网络资源管理方面有着极高的要求。三态内容寻址存储器(Ternary Content Add...软件定义网络(Software Defined Networking,SDN)是一种革命性的网络架构,主要思想是将控制平面与数据平面分离,并且还拥有开放可编程特性。其对数据包转发以及网络资源管理方面有着极高的要求。三态内容寻址存储器(Ternary Content Addressable Memory,TCAM)因其快速规则匹配能力通常作为规则的缓存应用于SDN交换机中。规则缓存将大部分流量引导到高性能的硬件路径上,可以显著提升网络性能。然而,规则之间存在的依赖关系使得缓存的利用率变低。因此,合理的规则缓存算法对本就稀有的TCAM资源来说十分重要。聚焦规则间的依赖关系,该文提出了CacheBand规则缓存算法。该算法利用包围盒思想,通过对规则及当前流量的分析,智能产生绷带规则,切断了规则间的依赖关系。实验证明,在不同数据包速率下,与同类算法相比,CacheBand可减少约68%的缓存条目,显著降低了流表压力,为数据转发提供了可靠的缓存方案。展开更多
文摘Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.
文摘Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.
文摘Content Addressable Memory (CAM) is a type of memory used for high-speed search applications. Due to parallel comparison feature, the CAM memory leads to large power consumption which is caused by frequent pre-charge or discharge of match line. In this paper, CAM for automatic charge balancing with self-control mechanism is proposed to control the voltage swing of ML for reducing the power consumption of CAM. Another technique to reduce the power dissipation is to use MSML, it combines the master-slave architecture with charge minimization technique. Unlike the conventional design, only one match line (ML) is used, whereas in Master-Slave Match Line (MSML) one master ML and several slave MLs are used to reduce the power dissipation in CAM caused by match lines (MLs). Theoretically, the match line (ML) reduces the power consumption up to 50% which is independent of search and match case. The simulation results using Cadence tool of MSML show the reduced power consumption in CAM and modified CAM cell.