Digital design of a digital signal processor involves accurate and high-speed mathematical computation units.DSP units are one of the most power consuming and memory occupying devices.Multipliers are the common buildi...Digital design of a digital signal processor involves accurate and high-speed mathematical computation units.DSP units are one of the most power consuming and memory occupying devices.Multipliers are the common building blocks in most of the DSP units which demands low power and area constraints in the field of portable biomedical devices.This research works attempts multiple power reduction technique to limit the power dissipation of the proposed LUT multiplier unit.A lookup table-based multiplier has the advantage of almost constant area requirement’s irrespective to the increase in bit size of multiplier.Clock gating is usually used to reduce the unnecessary switching activities in idle circlet components.A clock tree structure is employed to enhance the SRAM based lookup table memory architecture.The LUT memory access operation is sequential in nature and instead of address decoder a ring counter is used to scan the memory contents and gated driver tree structure is implemented to control the clock and data switching activities.The proposed algorithm yields 20%of power reduction than existing.展开更多
A halftone watermarking method of high quality, robustness, and capacity flexibility is presented in this paper. An objective halftone image quality evaluation method based on the human visual system obtained by a lea...A halftone watermarking method of high quality, robustness, and capacity flexibility is presented in this paper. An objective halftone image quality evaluation method based on the human visual system obtained by a least-mean-square algorithm is also introduced. In the encoder, the kernels-alternated error diffusion (KAEDF) is applied. It is able to maintain the computational complexity at the same level as ordinary error diffusion. Compared with Hel-Or using ordered dithering, the proposed KAEDF yields a better image quality through using error diffusion. We also propose a weighted lookup table (WLUT) in the decoder instead of lookup table (LUT), as proposed by Pei and Guo, so as to achieve a higher decoded rate. As the experimental results demonstrate, this technique is able to guard against degradation due to tampering, cropping, rotation, and print-and-scan processes in error-diffused halftone images.展开更多
An algorithm is presented for obtaining placements of cell\|based very large scale integrated circuits, subject to timing constraints based on table\|lookup model. A new timing delay model based on some delay tables o...An algorithm is presented for obtaining placements of cell\|based very large scale integrated circuits, subject to timing constraints based on table\|lookup model. A new timing delay model based on some delay tables of fabricators is first simplified and deduced; then it is formulated as a constrained programming problem using the new timing delay model. The approach combines the well\|known quadratic placement with bottom\|up clustering, as well as the slicing partitioning strategy, which has been tested on a set of sample circuits from industry and the results obtained show that it is very promising.展开更多
In order to classify packet, we propose a novel IP classification based the non-collision hash and jumping table trie-tree (NHJTTT) algorithm, which is based on noncollision hash Trie-tree and Lakshman and Stiliadis p...In order to classify packet, we propose a novel IP classification based the non-collision hash and jumping table trie-tree (NHJTTT) algorithm, which is based on noncollision hash Trie-tree and Lakshman and Stiliadis proposing a 2-dimensional classification algorithm (LS algorithm). The core of algorithm consists of two parts: structure the non-collision hash function, which is constructed mainly based on destination/source port and protocol type field so that the hash function can avoid space explosion problem; introduce jumping table Trie-tree based LS algorithm in order to reduce time complexity. The test results show that the classification rate of NHJTTT algorithm is up to 1 million packets per second and the maximum memory consumed is 9 MB for 10 000 rules. Key words IP classification - lookup algorithm - trie-tree - non-collision hash - jumping table CLC number TN 393.06 Foundation item: Supported by the Chongqing of Posts and Telecommunications Younger Teacher Fundation (A2003-03).Biography: SHANG Feng-jun (1972-), male, Ph.D. candidate, lecture, research direction: the smart instrument and network.展开更多
This paper treats the problem of designing an optimal size for a lookup table used for sensor linearization. In small embedded systems the lookup table must be reduced to a minimum in order to reduce the memory footpr...This paper treats the problem of designing an optimal size for a lookup table used for sensor linearization. In small embedded systems the lookup table must be reduced to a minimum in order to reduce the memory footprint and intermediate table values are estimated by linear interpolation. Since interpolation introduces an estimation uncertainty that increases with the sparseness of the lookup table there is a trade-off between lookup table size and estimation precision. This work will present a theory for finding the minimum allowed size of a lookup table that does not affect the overall precision, i.e. the overall precision is determined by the lookup table entries’ precision, not by the interpolation error.展开更多
This paper proposes a power and time efficient scheme for designing IP lookup tables. The proposed scheme uses partitioned Ternary Content Addressable Memories (TCAMs) that store IP lookup tables. The proposed scheme ...This paper proposes a power and time efficient scheme for designing IP lookup tables. The proposed scheme uses partitioned Ternary Content Addressable Memories (TCAMs) that store IP lookup tables. The proposed scheme enables O(1) time penalty for updating an IP lookup table. The partitioned TCAMs allow an update done by a simple insertion without the need for routing table sorting. The organization of the routing table of the proposed scheme is based on a partition with respect to the output port for routing with a smaller priority encoder. The proposed scheme still preserves a similar storage requirement and clock rate to those of existing designs. Furthermore, this scheme reduces power consumption due to using a partitioned routing table.展开更多
To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convo...To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is used.With the help of the Winograd algorithm,the optimization of convolution and multiplication is realized to reduce the computational complexity.The LUT-based operator is further optimized to construct a processing unit(PE).Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth constraints.The data toggle rate is reduced to optimize power consumption.The experimental results show that the use of the Winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration,while the time-division multiplexing of processing units improves resource utilization.Under this experimental condition,compared with the traditional convolution method,the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 times.The LUT-based Winograd accelerator can effectively solve the deployment problem caused by limited hardware resources.展开更多
PIM-SM(Protocol Independent Multicast-Sparse Mode) is a main multicast routing pro-tocol in the IPv6(Internet Protocol version 6).It can use either a shared tree or a shortest path tree to deliver data packets,consequ...PIM-SM(Protocol Independent Multicast-Sparse Mode) is a main multicast routing pro-tocol in the IPv6(Internet Protocol version 6).It can use either a shared tree or a shortest path tree to deliver data packets,consequently the multicast IP lookup engine requires,in some cases,two searches to get a correct lookup result according to its multicast forwarding rule,and it may result in a new requirement of doubling the lookup speed of the lookup engine.The ordinary method to satisfy this requirement in TCAM(Ternary Content Addressable Memory) based lookup engines is to exploit parallelism among multiple TCAMs.However,traditional parallel methods always induce more re-sources and higher design difficulty.We propose in this paper a novel approach to solve this problem.By arranging multicast forwarding table in class sequence in TCAM and making full use of the intrinsic characteristic of the TCAM,our approach can get the right lookup result with just one search and a single TCAM,while keeping the hardware of lookup engine unchanged.Experimental results have shown that the approach make it possible to satisfy forwarding IPv6 multicast packets at the full link rate of 20 Gb/s with just one TCAM with the current TCAM chip.展开更多
The well-known marching cubes method is used to generate isosurfaces from volume data or data on a 3D rectilinear grid. To do so, it refers to a lookup table to decide on the possible configurations of the isosurface ...The well-known marching cubes method is used to generate isosurfaces from volume data or data on a 3D rectilinear grid. To do so, it refers to a lookup table to decide on the possible configurations of the isosurface within a given cube, assuming we know whether each vertex lies inside or outside the surface. However, the vertex values alone do not uniquely determine how the isosurface may pass through the cube, and in particular how it cuts each face of the cube. Earlier lookup tables are deficient in various respects. The possible combinations of the different configurations of such ambiguous faces are used in this paper to find a complete and cor- rect lookup table. Isosurfaces generated using the new lookup table here are guaranteed to be watertight.展开更多
Routing technology has been forced to evolve towards higher capacity and per port packet processing speed. The ability to achieve high forwarding speed is due to either software or hardware technology. TCAM (Ternary C...Routing technology has been forced to evolve towards higher capacity and per port packet processing speed. The ability to achieve high forwarding speed is due to either software or hardware technology. TCAM (Ternary Content Addressable Memory) provides a performance advantage over other software or hardware search algorithms, often resulting in an order of magnitude reduction of search time. But slow updates may affect the performance of TCAM based routing lookup. So the key is to design a table management algorithm, which supports high speed updates in TCAMs. This paper presented three table management algorithms, and then compared their performance. Finally, the optimal one after comparing was given.展开更多
Efficient lookup is essential for peer-to-peer networks and Chord is a representative peer-to-peer lookup scheme based on distributed hash table (DHT). In peer-to-peer networks, each node maintains several unidirectio...Efficient lookup is essential for peer-to-peer networks and Chord is a representative peer-to-peer lookup scheme based on distributed hash table (DHT). In peer-to-peer networks, each node maintains several unidirectional application layer links to other nodes and forwards lookup messages through such links. This paper proposes use of bidirectional links to improve the lookup performance in Chord. Every original unidirectional link is replaced by a bidirectional link, and accordingly every node becomes an anti-finger of all its finger nodes. Both theoretical analyses and experimental results indicate that these anti-fingers can help improve the lookup performance greatly with very low overhead.展开更多
This brief proposes an area and speed efficient implementation of symmetric finite impulse response (FIR) digital filter using reduced parallel look-up table (LUT) distributed arithmetic (DA) based approach. The compl...This brief proposes an area and speed efficient implementation of symmetric finite impulse response (FIR) digital filter using reduced parallel look-up table (LUT) distributed arithmetic (DA) based approach. The complexity lying in the realization of FIR filter is dominated by the multiplier structure. This complexity grows further with filter order, which results in increased area, power, and reduced speed of operation. The speed of operation is improved over multiply-accumulate approach using multiplier less conventional DA based design and decomposed DA based design. Both the structure requires B clock cycles to get the filter output for the input width of B, which limits the speed of DA structure. This limitation is addressed using parallel LUTs, called high speed DA FIR, at the expense of additional hardware cost. With large number of taps, the number of LUTs and its size also becomes large. In the proposed method, by exploiting coefficient symmetry property, the number of LUTs in the decomposed DA form is reduced by a factor of about 2. This proposed approach is applied in high speed DA based FIR design, to obtain area and speed efficient structure. The proposed design offers around 40% less area and 53.98% less slice-delay product (SDP) than the high throughput DA based structure when it’s implemented over Xilinx Virtex-5 FPGA device-XC5VSX95T-1FF1136 for 16-tap symmetric FIR filter. The proposed design on the same FPGA device, supports up to 607 MHz input sampling frequency, and offers 60.5% more speed and 67.71% less SDP than the systolic DA based design.展开更多
The authors present a routing lookup architecture, SDIR(SDRAM based Direct Index Routing). With pipeline and interleaving access technique, SDIR can provide scalable lookup speed from 16 7 MPPS(mega packet per second)...The authors present a routing lookup architecture, SDIR(SDRAM based Direct Index Routing). With pipeline and interleaving access technique, SDIR can provide scalable lookup speed from 16 7 MPPS(mega packet per second) to 133 MPPS with SDRAM running at 133MHz frequency.展开更多
文摘Digital design of a digital signal processor involves accurate and high-speed mathematical computation units.DSP units are one of the most power consuming and memory occupying devices.Multipliers are the common building blocks in most of the DSP units which demands low power and area constraints in the field of portable biomedical devices.This research works attempts multiple power reduction technique to limit the power dissipation of the proposed LUT multiplier unit.A lookup table-based multiplier has the advantage of almost constant area requirement’s irrespective to the increase in bit size of multiplier.Clock gating is usually used to reduce the unnecessary switching activities in idle circlet components.A clock tree structure is employed to enhance the SRAM based lookup table memory architecture.The LUT memory access operation is sequential in nature and instead of address decoder a ring counter is used to scan the memory contents and gated driver tree structure is implemented to control the clock and data switching activities.The proposed algorithm yields 20%of power reduction than existing.
基金supported by National Science Council under Grants No. NSC 99-2631-H-011-001
文摘A halftone watermarking method of high quality, robustness, and capacity flexibility is presented in this paper. An objective halftone image quality evaluation method based on the human visual system obtained by a least-mean-square algorithm is also introduced. In the encoder, the kernels-alternated error diffusion (KAEDF) is applied. It is able to maintain the computational complexity at the same level as ordinary error diffusion. Compared with Hel-Or using ordered dithering, the proposed KAEDF yields a better image quality through using error diffusion. We also propose a weighted lookup table (WLUT) in the decoder instead of lookup table (LUT), as proposed by Pei and Guo, so as to achieve a higher decoded rate. As the experimental results demonstrate, this technique is able to guard against degradation due to tampering, cropping, rotation, and print-and-scan processes in error-diffused halftone images.
基金Project Supported by National Natural Science Foundation of China!( No.697760 2 7) and by973 National Key Project( No.G1 9980)
文摘An algorithm is presented for obtaining placements of cell\|based very large scale integrated circuits, subject to timing constraints based on table\|lookup model. A new timing delay model based on some delay tables of fabricators is first simplified and deduced; then it is formulated as a constrained programming problem using the new timing delay model. The approach combines the well\|known quadratic placement with bottom\|up clustering, as well as the slicing partitioning strategy, which has been tested on a set of sample circuits from industry and the results obtained show that it is very promising.
文摘In order to classify packet, we propose a novel IP classification based the non-collision hash and jumping table trie-tree (NHJTTT) algorithm, which is based on noncollision hash Trie-tree and Lakshman and Stiliadis proposing a 2-dimensional classification algorithm (LS algorithm). The core of algorithm consists of two parts: structure the non-collision hash function, which is constructed mainly based on destination/source port and protocol type field so that the hash function can avoid space explosion problem; introduce jumping table Trie-tree based LS algorithm in order to reduce time complexity. The test results show that the classification rate of NHJTTT algorithm is up to 1 million packets per second and the maximum memory consumed is 9 MB for 10 000 rules. Key words IP classification - lookup algorithm - trie-tree - non-collision hash - jumping table CLC number TN 393.06 Foundation item: Supported by the Chongqing of Posts and Telecommunications Younger Teacher Fundation (A2003-03).Biography: SHANG Feng-jun (1972-), male, Ph.D. candidate, lecture, research direction: the smart instrument and network.
文摘This paper treats the problem of designing an optimal size for a lookup table used for sensor linearization. In small embedded systems the lookup table must be reduced to a minimum in order to reduce the memory footprint and intermediate table values are estimated by linear interpolation. Since interpolation introduces an estimation uncertainty that increases with the sparseness of the lookup table there is a trade-off between lookup table size and estimation precision. This work will present a theory for finding the minimum allowed size of a lookup table that does not affect the overall precision, i.e. the overall precision is determined by the lookup table entries’ precision, not by the interpolation error.
文摘This paper proposes a power and time efficient scheme for designing IP lookup tables. The proposed scheme uses partitioned Ternary Content Addressable Memories (TCAMs) that store IP lookup tables. The proposed scheme enables O(1) time penalty for updating an IP lookup table. The partitioned TCAMs allow an update done by a simple insertion without the need for routing table sorting. The organization of the routing table of the proposed scheme is based on a partition with respect to the output port for routing with a smaller priority encoder. The proposed scheme still preserves a similar storage requirement and clock rate to those of existing designs. Furthermore, this scheme reduces power consumption due to using a partitioned routing table.
基金The Academic Colleges and Universities Innovation Program 2.0(No.BP0719013)。
文摘To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference,a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is used.With the help of the Winograd algorithm,the optimization of convolution and multiplication is realized to reduce the computational complexity.The LUT-based operator is further optimized to construct a processing unit(PE).Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth constraints.The data toggle rate is reduced to optimize power consumption.The experimental results show that the use of the Winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration,while the time-division multiplexing of processing units improves resource utilization.Under this experimental condition,compared with the traditional convolution method,the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 times.The LUT-based Winograd accelerator can effectively solve the deployment problem caused by limited hardware resources.
基金Supported by the National High-Tech Research and De-velopment Plan of China (No. 2007AA01Z2a1)the Na-tional Grand Fundamental Research 973 Program of China (No. 2007CB307102)
文摘PIM-SM(Protocol Independent Multicast-Sparse Mode) is a main multicast routing pro-tocol in the IPv6(Internet Protocol version 6).It can use either a shared tree or a shortest path tree to deliver data packets,consequently the multicast IP lookup engine requires,in some cases,two searches to get a correct lookup result according to its multicast forwarding rule,and it may result in a new requirement of doubling the lookup speed of the lookup engine.The ordinary method to satisfy this requirement in TCAM(Ternary Content Addressable Memory) based lookup engines is to exploit parallelism among multiple TCAMs.However,traditional parallel methods always induce more re-sources and higher design difficulty.We propose in this paper a novel approach to solve this problem.By arranging multicast forwarding table in class sequence in TCAM and making full use of the intrinsic characteristic of the TCAM,our approach can get the right lookup result with just one search and a single TCAM,while keeping the hardware of lookup engine unchanged.Experimental results have shown that the approach make it possible to satisfy forwarding IPv6 multicast packets at the full link rate of 20 Gb/s with just one TCAM with the current TCAM chip.
文摘The well-known marching cubes method is used to generate isosurfaces from volume data or data on a 3D rectilinear grid. To do so, it refers to a lookup table to decide on the possible configurations of the isosurface within a given cube, assuming we know whether each vertex lies inside or outside the surface. However, the vertex values alone do not uniquely determine how the isosurface may pass through the cube, and in particular how it cuts each face of the cube. Earlier lookup tables are deficient in various respects. The possible combinations of the different configurations of such ambiguous faces are used in this paper to find a complete and cor- rect lookup table. Isosurfaces generated using the new lookup table here are guaranteed to be watertight.
文摘Routing technology has been forced to evolve towards higher capacity and per port packet processing speed. The ability to achieve high forwarding speed is due to either software or hardware technology. TCAM (Ternary Content Addressable Memory) provides a performance advantage over other software or hardware search algorithms, often resulting in an order of magnitude reduction of search time. But slow updates may affect the performance of TCAM based routing lookup. So the key is to design a table management algorithm, which supports high speed updates in TCAMs. This paper presented three table management algorithms, and then compared their performance. Finally, the optimal one after comparing was given.
文摘Efficient lookup is essential for peer-to-peer networks and Chord is a representative peer-to-peer lookup scheme based on distributed hash table (DHT). In peer-to-peer networks, each node maintains several unidirectional application layer links to other nodes and forwards lookup messages through such links. This paper proposes use of bidirectional links to improve the lookup performance in Chord. Every original unidirectional link is replaced by a bidirectional link, and accordingly every node becomes an anti-finger of all its finger nodes. Both theoretical analyses and experimental results indicate that these anti-fingers can help improve the lookup performance greatly with very low overhead.
文摘This brief proposes an area and speed efficient implementation of symmetric finite impulse response (FIR) digital filter using reduced parallel look-up table (LUT) distributed arithmetic (DA) based approach. The complexity lying in the realization of FIR filter is dominated by the multiplier structure. This complexity grows further with filter order, which results in increased area, power, and reduced speed of operation. The speed of operation is improved over multiply-accumulate approach using multiplier less conventional DA based design and decomposed DA based design. Both the structure requires B clock cycles to get the filter output for the input width of B, which limits the speed of DA structure. This limitation is addressed using parallel LUTs, called high speed DA FIR, at the expense of additional hardware cost. With large number of taps, the number of LUTs and its size also becomes large. In the proposed method, by exploiting coefficient symmetry property, the number of LUTs in the decomposed DA form is reduced by a factor of about 2. This proposed approach is applied in high speed DA based FIR design, to obtain area and speed efficient structure. The proposed design offers around 40% less area and 53.98% less slice-delay product (SDP) than the high throughput DA based structure when it’s implemented over Xilinx Virtex-5 FPGA device-XC5VSX95T-1FF1136 for 16-tap symmetric FIR filter. The proposed design on the same FPGA device, supports up to 607 MHz input sampling frequency, and offers 60.5% more speed and 67.71% less SDP than the systolic DA based design.
文摘The authors present a routing lookup architecture, SDIR(SDRAM based Direct Index Routing). With pipeline and interleaving access technique, SDIR can provide scalable lookup speed from 16 7 MPPS(mega packet per second) to 133 MPPS with SDRAM running at 133MHz frequency.