Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc...Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.展开更多
Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual...Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.展开更多
Content Addressable Memory (CAM) is a type of memory used for high-speed search applications. Due to parallel comparison feature, the CAM memory leads to large power consumption which is caused by frequent pre-charge ...Content Addressable Memory (CAM) is a type of memory used for high-speed search applications. Due to parallel comparison feature, the CAM memory leads to large power consumption which is caused by frequent pre-charge or discharge of match line. In this paper, CAM for automatic charge balancing with self-control mechanism is proposed to control the voltage swing of ML for reducing the power consumption of CAM. Another technique to reduce the power dissipation is to use MSML, it combines the master-slave architecture with charge minimization technique. Unlike the conventional design, only one match line (ML) is used, whereas in Master-Slave Match Line (MSML) one master ML and several slave MLs are used to reduce the power dissipation in CAM caused by match lines (MLs). Theoretically, the match line (ML) reduces the power consumption up to 50% which is independent of search and match case. The simulation results using Cadence tool of MSML show the reduced power consumption in CAM and modified CAM cell.展开更多
We first study the impacts of soft errors on various types of CAM for different feature sizes.After presenting a soft error immune CAM cell,SSB-RCAM,we propose two kinds of reliable CAM,DCF-RCAM and DCK-RCAM. In addit...We first study the impacts of soft errors on various types of CAM for different feature sizes.After presenting a soft error immune CAM cell,SSB-RCAM,we propose two kinds of reliable CAM,DCF-RCAM and DCK-RCAM. In addition,we present an ignore mechanism to protect dual cell redundancy CAMs against soft errors.Experimental results indicate that the 11T-NOR CAM cell has an advantage in soft error immunity.Based on 11T-NOR,the proposed reliable CAMs reduce the SER by about 81%on average with acceptable overheads.The SER of dual cell redundancy CAMs can also be decreased using the ignore mechanism in specific applications.展开更多
Unchecked breast cell growth is one of the leading causes of death in women globally and is the cause of breast cancer.The only method to avoid breast cancer-related deaths is through early detection and treatment.The...Unchecked breast cell growth is one of the leading causes of death in women globally and is the cause of breast cancer.The only method to avoid breast cancer-related deaths is through early detection and treatment.The proper classification of malignancies is one of the most significant challenges in the medical industry.Due to their high precision and accuracy,machine learning techniques are extensively employed for identifying and classifying various forms of cancer.Several data mining algorithms were studied and implemented by the author of this review and compared them to the present parameters and accuracy of various algorithms for breast cancer diagnosis such that clinicians might use them to accurately detect cancer cells early on.This article introduces several techniques,including support vector machine(SVM),K star(K∗)classifier,Additive Regression(AR),Back Propagation Neural Network(BP),and Bagging.These algorithms are trained using a set of data that contains tumor parameters from breast cancer patients.Comparing the results,the author found that Support Vector Machine and Bagging had the highest precision and accuracy,respectively.Also,assess the number of studies that provide machine learning techniques for breast cancer detection.展开更多
Recent architectures of multi-core systems may have a relatively large number of cores that typically ranges from tens to hundreds;therefore called many-core systems.Such systems require an efficient interconnection n...Recent architectures of multi-core systems may have a relatively large number of cores that typically ranges from tens to hundreds;therefore called many-core systems.Such systems require an efficient interconnection network that tries to address two major problems.First,the overhead of power and area cost and its effect on scalability.Second,high access latency is caused by multiple cores’simultaneous accesses of the same shared module.This paper presents an interconnection scheme called N-conjugate Shuffle Clusters(NCSC)based on multi-core multicluster architecture to reduce the overhead of the just mentioned problems.NCSC eliminated the need for router devices and their complexity and hence reduced the power and area costs.It also resigned and distributed the shared caches across the interconnection network to increase the ability for simultaneous access and hence reduce the access latency.For intra-cluster communication,Multi-port Content Addressable Memory(MPCAM)is used.The experimental results using four clusters and four cores each indicated that the average access latency for a write process is 1.14785±0.04532 ns which is nearly equal to the latency of a write operation in MPCAM.Moreover,it was demonstrated that the average read latency within a cluster is 1.26226±0.090591 ns and around 1.92738±0.139588 ns for read access between cores from different clusters.展开更多
Similarity search,that is,finding similar items in massive data,is a fundamental computing problem in many fields such as data mining and information retrieval.However,for large-scale and high-dimension data,it suffer...Similarity search,that is,finding similar items in massive data,is a fundamental computing problem in many fields such as data mining and information retrieval.However,for large-scale and high-dimension data,it suffers from high computational complexity,requiring tremendous computation resources.Here,based on the low-power self-selective memristors,for the first time,we propose an in-memory search(IMS)system with two innovative designs.First,by exploiting the natural distribution law of the devices resistance,a hardware locality sensitive hashing encoder has been designed to transform the realvalued vectors into more efficient binary codes.Second,a compact memristive ternary content addressable memory is developed to calculate the Hamming distances between the binary codes in parallel.Our IMS system demonstrated a 168energy efficiency improvement over all-transistors counterparts in clustering and classification tasks,while achieving a software-comparable accuracy,thus providing a low-complexity and low-power solution for in-memory data mining applications.展开更多
Content addressable memory (CAM) is widely used and its tests mostly use functional fault models. However, functional fault models cannot describe some physical faults exactly. This paper introduces physical fault m...Content addressable memory (CAM) is widely used and its tests mostly use functional fault models. However, functional fault models cannot describe some physical faults exactly. This paper introduces physical fault models for write-only CAM. Two test algorithms which can cover 100% targeted physical faults are also proposed. The algorithm for a CAM module with N-bit match output signal needs only 2N+2L+4 comparison operations and 5N writing operations, where N is the number of words and L is the word length. The algorithm for a HIT-signal-only CAM module uses 2N+2L+5 comparison operations and 8N writing operations. Compared to previous work, the proposed algorithms can test more physical faults with a few more operations. An experiment on a test chip shows the effectiveness and efficiency of the proposed physical fault models and algorithms.展开更多
The feature of Ternary Content Addressable Memories(TCAMs) makes them particularly attractive for IP address lookup and packet classification applications in a router system. However,the limitations of TCAMs impede th...The feature of Ternary Content Addressable Memories(TCAMs) makes them particularly attractive for IP address lookup and packet classification applications in a router system. However,the limitations of TCAMs impede their utilization. In this paper,the solutions for decreasing the power consumption and avoiding entry expansion in range matching are addressed. Experimental results demonstrate that the proposed techniques can make some big improvements on the performance of TCAMs in IP address lookup and packet classification.展开更多
PIM-SM(Protocol Independent Multicast-Sparse Mode) is a main multicast routing pro-tocol in the IPv6(Internet Protocol version 6).It can use either a shared tree or a shortest path tree to deliver data packets,consequ...PIM-SM(Protocol Independent Multicast-Sparse Mode) is a main multicast routing pro-tocol in the IPv6(Internet Protocol version 6).It can use either a shared tree or a shortest path tree to deliver data packets,consequently the multicast IP lookup engine requires,in some cases,two searches to get a correct lookup result according to its multicast forwarding rule,and it may result in a new requirement of doubling the lookup speed of the lookup engine.The ordinary method to satisfy this requirement in TCAM(Ternary Content Addressable Memory) based lookup engines is to exploit parallelism among multiple TCAMs.However,traditional parallel methods always induce more re-sources and higher design difficulty.We propose in this paper a novel approach to solve this problem.By arranging multicast forwarding table in class sequence in TCAM and making full use of the intrinsic characteristic of the TCAM,our approach can get the right lookup result with just one search and a single TCAM,while keeping the hardware of lookup engine unchanged.Experimental results have shown that the approach make it possible to satisfy forwarding IPv6 multicast packets at the full link rate of 20 Gb/s with just one TCAM with the current TCAM chip.展开更多
An internal structure of Ternary Content Addressable Memory (TCAM) is designed and a Sorting Prefix Block (SPB) algorithm is presented, which is a wire-speed routing lookup algorithm based on TCAM. SPB algorithm makes...An internal structure of Ternary Content Addressable Memory (TCAM) is designed and a Sorting Prefix Block (SPB) algorithm is presented, which is a wire-speed routing lookup algorithm based on TCAM. SPB algorithm makes use of the parallelism of TCAM adequately, and improves the utilization of TCAM by optimum partitions. With the aid of effective management algorithm and memory image, SPB separates critical searching from assistant searching, and improves the searching effect. One performance test indicates that this algorithm can work with different TCAM to meet the requirement of wire-speed routing lookup.展开更多
Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyz...Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAM-based two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimeusional PPC. Compared with other algorithms, this algorithm has the least time complexity and less space complexity.展开更多
To meet the future internet traffic challenges, enhancement of hardware architectures related to network security has vital role where software security algorithms are incompatible with high speed in terms of Giga bit...To meet the future internet traffic challenges, enhancement of hardware architectures related to network security has vital role where software security algorithms are incompatible with high speed in terms of Giga bits per second (Gbps). In this paper, we discuss signature detection technique (SDT) used in network intrusion detection system (NIDS). Design of most commonly used hardware based techniques for signature detection such as finite automata, discrete comparators, Knuth-Morris-Pratt (KMP) algorithm, content addressable memory (CAM) and Bloom filter are discussed. Two novel architectures, XOR based pre computation CAM (XPCAM) and multi stage look up technique (MSLT) Bloom filter architectures are proposed and implemented in third party field programmable gate array (FPGA), and area and power consumptions are compared. 10Gbps network traffic generator (TNTG) is used to test the functionality and ensure the reliability of the proposed architectures. Our approach involves a unique combination of algorithmic and architectural techniques that outperform some of the current techniques in terms of performance, speed and powerefficiency.展开更多
In this paper, we review the recent trends in parallel search and artificial intelligence (AI) applications using emerging non-volatile ternary content addressable memory (TCAM). Firstly, the principle and development...In this paper, we review the recent trends in parallel search and artificial intelligence (AI) applications using emerging non-volatile ternary content addressable memory (TCAM). Firstly, the principle and development of four typical emerging memory used to implement the non-volatile TCAM are discussed. Then, we analyze the principle and challenges of SRAM-based TCAM and non-volatile TCAM for the parallel search. Finally, the research trends and challenges of non-volatile TCAM used for AI application are presented, which include computer-science oriented and neuroscience oriented computing.展开更多
文摘Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.
文摘Multicore systems oftentimes use multiple levels of cache to bridge the gap between processor and memory speed.This paper presents a new design of a dedicated pipeline cache memory for multicore processors called dual port content addressable memory(DPCAM).In addition,it proposes a new replacement algorithm based on hardware which is called a near-far access replacement algorithm(NFRA)to reduce the cost overhead of the cache controller and improve the cache access latency.The experimental results indicated that the latency for write and read operations are significantly less in comparison with a set-associative cache memory.Moreover,it was shown that a latency of a read operation is nearly constant regardless of the size of DPCAM.However,an estimation of the power dissipation showed that DPCAM consumes about 7%greater than a set-associative cache memory of the same size.These results encourage for embedding DPCAM within the multicore processors as a small shared cache memory.
文摘Content Addressable Memory (CAM) is a type of memory used for high-speed search applications. Due to parallel comparison feature, the CAM memory leads to large power consumption which is caused by frequent pre-charge or discharge of match line. In this paper, CAM for automatic charge balancing with self-control mechanism is proposed to control the voltage swing of ML for reducing the power consumption of CAM. Another technique to reduce the power dissipation is to use MSML, it combines the master-slave architecture with charge minimization technique. Unlike the conventional design, only one match line (ML) is used, whereas in Master-Slave Match Line (MSML) one master ML and several slave MLs are used to reduce the power dissipation in CAM caused by match lines (MLs). Theoretically, the match line (ML) reduces the power consumption up to 50% which is independent of search and match case. The simulation results using Cadence tool of MSML show the reduced power consumption in CAM and modified CAM cell.
基金supported by the National Natural Science Foundation of China(No.60703074)the National High-Tech Research and Development Program of China(No.2009AA01Z124)
文摘We first study the impacts of soft errors on various types of CAM for different feature sizes.After presenting a soft error immune CAM cell,SSB-RCAM,we propose two kinds of reliable CAM,DCF-RCAM and DCK-RCAM. In addition,we present an ignore mechanism to protect dual cell redundancy CAMs against soft errors.Experimental results indicate that the 11T-NOR CAM cell has an advantage in soft error immunity.Based on 11T-NOR,the proposed reliable CAMs reduce the SER by about 81%on average with acceptable overheads.The SER of dual cell redundancy CAMs can also be decreased using the ignore mechanism in specific applications.
基金the Deanship of Scientific Research at King Khalid University for funding this work through the General Research Project under Grant Number(RGP2/230/44).
文摘Unchecked breast cell growth is one of the leading causes of death in women globally and is the cause of breast cancer.The only method to avoid breast cancer-related deaths is through early detection and treatment.The proper classification of malignancies is one of the most significant challenges in the medical industry.Due to their high precision and accuracy,machine learning techniques are extensively employed for identifying and classifying various forms of cancer.Several data mining algorithms were studied and implemented by the author of this review and compared them to the present parameters and accuracy of various algorithms for breast cancer diagnosis such that clinicians might use them to accurately detect cancer cells early on.This article introduces several techniques,including support vector machine(SVM),K star(K∗)classifier,Additive Regression(AR),Back Propagation Neural Network(BP),and Bagging.These algorithms are trained using a set of data that contains tumor parameters from breast cancer patients.Comparing the results,the author found that Support Vector Machine and Bagging had the highest precision and accuracy,respectively.Also,assess the number of studies that provide machine learning techniques for breast cancer detection.
文摘Recent architectures of multi-core systems may have a relatively large number of cores that typically ranges from tens to hundreds;therefore called many-core systems.Such systems require an efficient interconnection network that tries to address two major problems.First,the overhead of power and area cost and its effect on scalability.Second,high access latency is caused by multiple cores’simultaneous accesses of the same shared module.This paper presents an interconnection scheme called N-conjugate Shuffle Clusters(NCSC)based on multi-core multicluster architecture to reduce the overhead of the just mentioned problems.NCSC eliminated the need for router devices and their complexity and hence reduced the power and area costs.It also resigned and distributed the shared caches across the interconnection network to increase the ability for simultaneous access and hence reduce the access latency.For intra-cluster communication,Multi-port Content Addressable Memory(MPCAM)is used.The experimental results using four clusters and four cores each indicated that the average access latency for a write process is 1.14785±0.04532 ns which is nearly equal to the latency of a write operation in MPCAM.Moreover,it was demonstrated that the average read latency within a cluster is 1.26226±0.090591 ns and around 1.92738±0.139588 ns for read access between cores from different clusters.
基金National Key Research and Development Plan of MOST of China,Grant/Award Numbers:2019YFB2205100,2021ZD0201201National Natural Science Foundation of China,Grant/Award Number:92064012+1 种基金Hubei Engineering Research Center on MicroelectronicsChua Memristor Institute。
文摘Similarity search,that is,finding similar items in massive data,is a fundamental computing problem in many fields such as data mining and information retrieval.However,for large-scale and high-dimension data,it suffers from high computational complexity,requiring tremendous computation resources.Here,based on the low-power self-selective memristors,for the first time,we propose an in-memory search(IMS)system with two innovative designs.First,by exploiting the natural distribution law of the devices resistance,a hardware locality sensitive hashing encoder has been designed to transform the realvalued vectors into more efficient binary codes.Second,a compact memristive ternary content addressable memory is developed to calculate the Hamming distances between the binary codes in parallel.Our IMS system demonstrated a 168energy efficiency improvement over all-transistors counterparts in clustering and classification tasks,while achieving a software-comparable accuracy,thus providing a low-complexity and low-power solution for in-memory data mining applications.
基金supported by the National Natural Science Foundation of China (No.60603049)the National High Technology Research and Development Program of China (Nos.2008AA110901,2007AA01Z112,2009AA01Z125)+1 种基金the State Key Development Program for Basic Research of China (No.2005CB321600)the Beijing Natural Science Foundation (No.4072024)
文摘Content addressable memory (CAM) is widely used and its tests mostly use functional fault models. However, functional fault models cannot describe some physical faults exactly. This paper introduces physical fault models for write-only CAM. Two test algorithms which can cover 100% targeted physical faults are also proposed. The algorithm for a CAM module with N-bit match output signal needs only 2N+2L+4 comparison operations and 5N writing operations, where N is the number of words and L is the word length. The algorithm for a HIT-signal-only CAM module uses 2N+2L+5 comparison operations and 8N writing operations. Compared to previous work, the proposed algorithms can test more physical faults with a few more operations. An experiment on a test chip shows the effectiveness and efficiency of the proposed physical fault models and algorithms.
基金the National Natural Science Foundation of China (No.60532030).
文摘The feature of Ternary Content Addressable Memories(TCAMs) makes them particularly attractive for IP address lookup and packet classification applications in a router system. However,the limitations of TCAMs impede their utilization. In this paper,the solutions for decreasing the power consumption and avoiding entry expansion in range matching are addressed. Experimental results demonstrate that the proposed techniques can make some big improvements on the performance of TCAMs in IP address lookup and packet classification.
基金Supported by the National High-Tech Research and De-velopment Plan of China (No. 2007AA01Z2a1)the Na-tional Grand Fundamental Research 973 Program of China (No. 2007CB307102)
文摘PIM-SM(Protocol Independent Multicast-Sparse Mode) is a main multicast routing pro-tocol in the IPv6(Internet Protocol version 6).It can use either a shared tree or a shortest path tree to deliver data packets,consequently the multicast IP lookup engine requires,in some cases,two searches to get a correct lookup result according to its multicast forwarding rule,and it may result in a new requirement of doubling the lookup speed of the lookup engine.The ordinary method to satisfy this requirement in TCAM(Ternary Content Addressable Memory) based lookup engines is to exploit parallelism among multiple TCAMs.However,traditional parallel methods always induce more re-sources and higher design difficulty.We propose in this paper a novel approach to solve this problem.By arranging multicast forwarding table in class sequence in TCAM and making full use of the intrinsic characteristic of the TCAM,our approach can get the right lookup result with just one search and a single TCAM,while keeping the hardware of lookup engine unchanged.Experimental results have shown that the approach make it possible to satisfy forwarding IPv6 multicast packets at the full link rate of 20 Gb/s with just one TCAM with the current TCAM chip.
文摘An internal structure of Ternary Content Addressable Memory (TCAM) is designed and a Sorting Prefix Block (SPB) algorithm is presented, which is a wire-speed routing lookup algorithm based on TCAM. SPB algorithm makes use of the parallelism of TCAM adequately, and improves the utilization of TCAM by optimum partitions. With the aid of effective management algorithm and memory image, SPB separates critical searching from assistant searching, and improves the searching effect. One performance test indicates that this algorithm can work with different TCAM to meet the requirement of wire-speed routing lookup.
基金Foundation item: supported by Intel Corporation (No. 9078)
文摘Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAM-based two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimeusional PPC. Compared with other algorithms, this algorithm has the least time complexity and less space complexity.
文摘To meet the future internet traffic challenges, enhancement of hardware architectures related to network security has vital role where software security algorithms are incompatible with high speed in terms of Giga bits per second (Gbps). In this paper, we discuss signature detection technique (SDT) used in network intrusion detection system (NIDS). Design of most commonly used hardware based techniques for signature detection such as finite automata, discrete comparators, Knuth-Morris-Pratt (KMP) algorithm, content addressable memory (CAM) and Bloom filter are discussed. Two novel architectures, XOR based pre computation CAM (XPCAM) and multi stage look up technique (MSLT) Bloom filter architectures are proposed and implemented in third party field programmable gate array (FPGA), and area and power consumptions are compared. 10Gbps network traffic generator (TNTG) is used to test the functionality and ensure the reliability of the proposed architectures. Our approach involves a unique combination of algorithmic and architectural techniques that outperform some of the current techniques in terms of performance, speed and powerefficiency.
文摘In this paper, we review the recent trends in parallel search and artificial intelligence (AI) applications using emerging non-volatile ternary content addressable memory (TCAM). Firstly, the principle and development of four typical emerging memory used to implement the non-volatile TCAM are discussed. Then, we analyze the principle and challenges of SRAM-based TCAM and non-volatile TCAM for the parallel search. Finally, the research trends and challenges of non-volatile TCAM used for AI application are presented, which include computer-science oriented and neuroscience oriented computing.