The interconnect temperature of very large scale integration(VLSI) circuits keeps rising due to self-heating and substrate temperature, which can increase the delay and power dissipation of interconnect wires. The t...The interconnect temperature of very large scale integration(VLSI) circuits keeps rising due to self-heating and substrate temperature, which can increase the delay and power dissipation of interconnect wires. The thermal vias are regarded as a promising method to improve the temperature performance of VLSI circuits. In this paper, the extra thermal vias were used to decrease the delay and power dissipation of interconnect wires of VLSI circuits. Two analytical models were presented for interconnect temperature, delay and power dissipation with adding extra dummy thermal vias. The influence of the number of thermal vias on the delay and power dissipation of interconnect wires was analyzed and the optimal via separation distance was investigated. The experimental results show that the adding extra dummy thermal vias can reduce the interconnect average temperature, maximum temperature, delay and power dissipation. Moreover, this method is also suitable for clock signal wires with a large root mean square current.展开更多
Guest Editor in Chief:Professor Xiao-Ping Zhang, University of Birmingham The potential for renewable energy to make contributions to mitigating the impact of climate change is expected to increase significantly in th...Guest Editor in Chief:Professor Xiao-Ping Zhang, University of Birmingham The potential for renewable energy to make contributions to mitigating the impact of climate change is expected to increase significantly in the longer term. Renewable energy generation technologies including onshore wind, offshore wind, wave,tidal, marine current, and ocean thermal energy generation as well as PV power generation, which are considered展开更多
To achieve high parallel computation of discrete wavelet transform (DWT) in JPEG2000, a high-throughput two-dimensional (2D) 9/7 DWT very large scale integration (VLSI) design is proposed, in which the row proce...To achieve high parallel computation of discrete wavelet transform (DWT) in JPEG2000, a high-throughput two-dimensional (2D) 9/7 DWT very large scale integration (VLSI) design is proposed, in which the row processor is based on flipping structure. Due to the difference of the input data flow, the column processor is obtained by adding the input selector and data buffer to the row processor. Normalization steps in row and column DWT are combined to reduce the number of multipliers, and the rationality is verified. By rearranging the output of four-line row DWT with a multiplexer (MUX), the amount of data processed by each column processor becomes half, and the four-input/four- output architecture is implemented. For an image with the size of N x N, the computing time of one-level 2D 9/7 DWT is 0.25N2 + 1.5N clock cycles. The critical path delay is one multiplier delay, and only 5N internal memory is required. The results of post-route simulation on FPGA show that clock frequency reaches 136 MHz, and the throughput is 544 Msample/s, which satisfies the requirements of high-speed applications.展开更多
Low power and real time very large scale integration (VLSI) architectures of motion estimation (ME) algorithms for mobile devices and applications are presented. The power reduction is achieved by devising a novel...Low power and real time very large scale integration (VLSI) architectures of motion estimation (ME) algorithms for mobile devices and applications are presented. The power reduction is achieved by devising a novel correction recovery mechanism based on algorithms which allow the use of reduced bit sum of absolute difference (RBSAD) metric for calculating matching error and conversion to full resolution sum of absolute difference (SAD) metric whenever necessary. Parallel and pipelined architectures for high throughput of full search ME corresponding to both the full resolution SAD and the generalized RBSAD algorithm are synthe- sized using Xilinx Synthesis Tools (XST), where the ME designs based on reduced bit (RB) algorithms demonstrate the reduction in power consumption up to 45% and/or the reduction in area up to 38%.展开更多
A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the cri...A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the critical path latency of computation, and to reduce the complexity of hardware implementation as well. The detailed derivation on the proposed algorithm, as well as the resulting Very Large Scale Integration (VLSI) architecture, is introduced, taking the 9/7 DWT as an example but without loss of generality. In comparison with the Conventional Lifting Algorithm Based Implementation (CLABI), the critical path latency of the proposed architecture is reduced by more than half from (4Tm + 8Ta)to Tm + 4Ta, and is competitive to that of Convolution-Based Implementation (CBI), but the new implementation will save significantly in hardware. The experimental results demonstrate that the proposed architecture has good performance in both increasing working frequency and reducing area.展开更多
The design of space-efficient support hardware for built-in self-testing is of great significance in very large scale integration circuits and systems, particularly in view of the paradigm shift in recent times from s...The design of space-efficient support hardware for built-in self-testing is of great significance in very large scale integration circuits and systems, particularly in view of the paradigm shift in recent times from system-on-board to system-on-chip technology. The subject paper proposes a new approach to designing aliasing-free or zero-aliasing space compaction hardware targeting specifically embedded cores-based system-on-chips for single stuck-line faults extending well-known concept from conventional switching theory, viz. that of compatibility relation as used in the minimization of incomplete sequential machines. For a pair of response outputs of the circuit under test, the method introduces the notion of fault detection compatibility and conditional fault detection compatibility (conditional upon some other response output pair being simultaneously fault detection compatible) with respect to two-input XOR/XNOR logic. The process is illustrated with design details of space compressors for the International Symposium on Circuits and Systems or ISCAS 85 combinational and ISCAS 89 full-scan sequential benchmark circuits using simulation programs ATALANTA and FSIM, attesting to the usefulness of the technique for its relative simplicity, resultant low area overhead and full fault coverage for single stuck-line faults, thus making it suitable in commercial design environments.展开更多
The rapid development in the digital circuit design enhances the applications on very large scale integration era. Encoders are one among the digital circuits found in all communication systems. The polar encoding is ...The rapid development in the digital circuit design enhances the applications on very large scale integration era. Encoders are one among the digital circuits found in all communication systems. The polar encoding is mainly meant for its channel achieving property. It finds its application in communications, sensing and information theory. This coding proposed by Erdal Arikan is significant because of its zero error floors and simple architecture for hardware implementation. In this paper, a folded polar encoder is designed to start from the fully parallel architecture and proceeds with its data flow graph, delay requirement calculation, lifetime analysis and register allocation, which results in a very large scale integration architecture with minimum hardware utilization. The results are simulated for 4 and 8 parallel folded 32-bit polar encoder using Xilinx 14.6 ISIM and implemented in Virtex 5 field programmable gate array. A comparison is made on fully parallel and various folding techniques based on their resource utilization.展开更多
Reversible logic is a new emerging technology with many promising applications in optical information processing, low power (Complementary Metal Oxide Semiconductor) CMOS design, (De Oxy RiboNucleic Acid) DNA computin...Reversible logic is a new emerging technology with many promising applications in optical information processing, low power (Complementary Metal Oxide Semiconductor) CMOS design, (De Oxy RiboNucleic Acid) DNA computing, etc. In industrial automation, comparators play an important role in segregating faulty patterns from good ones. In previous works, these comparators have been implemented with more number of reversible gates and computational complexity. All these comparators use propagation technique to compare the data. This will reduce the efficiency of the comparators. To overcome the problem, this paper proposes an efficient comparator using (Thapliyal Ranganathan) TR gate utilizing full subtraction and half subtraction algorithm which will improve the computation efficiency. The comparator design using half subtraction algorithm shows an improvement in terms of quantum cost. The comparator design using full subtraction algorithm shows effectiveness in reducing number of reversible gates required and garbage output.展开更多
An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields...An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields. Several novel design techniques for floating-point adder and multiplier are introduced in detail to enhance the speed of the system. At the same time, the power consumption is decreased. The hardware area is effectively reduced as an improved butterfly processor is developed. There is a substantial increase in the performance of the design since a pipelined architecture is adopted, and very large scale integrated (VLSI) is easy to realize due to the regularity. A result of validation using field programmable gate array (FPGA) is shown at the end. When the system clock is set to 50 MHz, 204.8 μs is needed to complete the operation of FFT computation.展开更多
The state-of-the-art multi-core computer systems are based on Very Large Scale three Dimensional (3D) Integrated circuits (VLSI). In order to provide high-speed vertical data transmission in such 3D systems, efficient...The state-of-the-art multi-core computer systems are based on Very Large Scale three Dimensional (3D) Integrated circuits (VLSI). In order to provide high-speed vertical data transmission in such 3D systems, efficient Through-Silicon Via (TSV) technology is critically important. In this paper, various Radio Frequency (RF) TSV designs and models are proposed. Specifically, the Cu-plug TSV with surrounding ground TSVs is used as the baseline structure. For further improvement, the dielectric coaxial and novel air-gap coaxial TSVs are introduced. Using the empirical parameters of these coaxial TSVs, the simulation results are obtained demonstrating that these coaxial RF-TSVs can provide two-order higher of cut-off frequencies than the Cu-plug TSVs. Based on these new RF-TSV technologies, we propose a novel 3D multi-core computer system as well as new architectures for manipulating the interfaces between RF and baseband circuit. Taking into consideration the scaling down of IC manufacture technologies, predictions for the performance of future generations of circuits are made. With simulation results indicating energy per bit and area per bit being reduced by 7% and 11% respectively, we can conclude that the proposed method is a worthwhile guideline for the design of future multi-core computer ICs.展开更多
In order to develop the core chip supporting binocular stereo displays for head mounted display (HMD) and glasses-TV, a very large scale integrated (VISI) design scheme is proposed by using a pipeline architecture...In order to develop the core chip supporting binocular stereo displays for head mounted display (HMD) and glasses-TV, a very large scale integrated (VISI) design scheme is proposed by using a pipeline architecture for 3D display processing chip (HMD100). Some key techniques including stereo display processing and high precision video scaling based bicubic interpolation, and their hardware implementations which improve the image quality are presented. The proposed HMD100 chip is verified by the field-programmable gate array (FPGA). As one of innovative and high integration SoC chips, HMD100 is designed by a digital and analog mixed circuit. It can support binocular stereo display, has better scaling effect and integration. Hence it is applicable in virtual reality (VR), 3D games and other microdisplay domains.展开更多
This paper proposes a low-power MOS current mode logic (MCML) circuit with sleep-transistor to reduce the leakage current. The sleep-transistor is used to high-threshold voltage transistor to minimize the leakage cu...This paper proposes a low-power MOS current mode logic (MCML) circuit with sleep-transistor to reduce the leakage current. The sleep-transistor is used to high-threshold voltage transistor to minimize the leakage current. The 16× 16 bit parallel multiplier is designed with the proposed technology. Comparing with the previous MCML circuit, the circuit achieves the reduction of the power consumption in sleep mode by 1/258. This circuit is designed with Samsung 0.35 um complementary metal oxide semiconductor (CMOS) process. The validity and effectiveness are verified through the HSPICE simulation.展开更多
Triple-threshold CMOS technique provides the transistors that have low-, normal-, and high-threshold voltage. This paper describes a low-power carry look-ahead adder with triple-threshold CMOS technique. While the low...Triple-threshold CMOS technique provides the transistors that have low-, normal-, and high-threshold voltage. This paper describes a low-power carry look-ahead adder with triple-threshold CMOS technique. While the low-threshold voltage transistors are used to reduce the propagation delay time in the critical path, the high-threshold voltage transistors are used to reduce the power consumption in the shortest path. Comparing with the conventional CMOS circuit, the circuit is achieved to reduce the power consumption by 14.71% and the power-delay-product by 16.11%. This circuit is designed with Samsung 0.35 um CMOS process. The validity and effectiveness are verified through the HSPICE simulation.展开更多
The influence of an electric field on metallic single walled carbon nanotube (SWCNT) interconnects is studied. A voltage-dependent equivalent circuit model is presented for the impedance parameters of single-wall ca...The influence of an electric field on metallic single walled carbon nanotube (SWCNT) interconnects is studied. A voltage-dependent equivalent circuit model is presented for the impedance parameters of single-wall carbon nanotubes that capture various electron-phonon scattering mechanisms as a function of the electric field. To estimate the performance of SWCNT bundle interconnects, signal delay and power dissipation are calculated based on the field dependent model that results in an improvement in the delay and power estimation accuracy compared to the field-independent model. We find that the power delay product of a SWCNT bundle increases with the increase in electric field but decreases with technology scaling showing that at a low electric field, the SWCNT bundle is a potential reliable alternative interconnect for future high performance VLSI industry at scaled technologies.展开更多
Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test applic...Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test application time. However, CSTP cannot reliably attain high fault coverage because of difficulty of testing random-pattern-resistant faults. This paper presents a deterministic CSTP (DCSTP) structure that consists of a DCSTP chain and jumping logic, to attain high fault coverage with low area overhead. Experimental re- sults on ISCAS’89 benchmarks show that 100% fault coverage can be obtained with low area overhead and CPU time, especially for large circuits.展开更多
A novel pulse stream neuron circuit is presented whose output pulse width facilitates sigmoid activation to activate the function of neurons. The wide symmetrical dynamic range of this neuron ensures high noise immuni...A novel pulse stream neuron circuit is presented whose output pulse width facilitates sigmoid activation to activate the function of neurons. The wide symmetrical dynamic range of this neuron ensures high noise immunity. The pulsed activation strategy provides a power efficient architecture, so the circuit has very low power dissipation. The simplicity of the circuit ensures its suitability for large-scale integration.展开更多
With technology scaling into nanometer regime, rampant process variations impact visible influences on leakage power estimation of very large scale integrations (VLSIs). In order to deal with the case of large inter- ...With technology scaling into nanometer regime, rampant process variations impact visible influences on leakage power estimation of very large scale integrations (VLSIs). In order to deal with the case of large inter- and intra-die variations, we induce a novel theory prototype of the statistical leakage power analysis (SLPA) for function blocks. Because inter-die variations can be pinned down into a small range but the number of gates in function blocks is large(>1000), we continue to simplify the prototype. At last, we induce the efficient methodology of SLPA. The method can save much running time for SLPA in the low power design since it is of the local-updating advantage. A large number of experimental data show that the method only takes feasible running time (0.32 s) to obtain accurate results (3 σ-error <0.5% on maximum) as function block circuits simultaneous suffer from 7.5%(3 σ/mean) inter-die and 7.5% intra-die length variations, which demonstrates that our method is suitable for statistical leakage power analysis of VLSIs under rampant process variations.展开更多
Circuit net list bipartitioning using simulated annealing technique has been proposed in the paper.The method converges asymptotically and probabilistically to global optimization.The circuit net list is partitioned i...Circuit net list bipartitioning using simulated annealing technique has been proposed in the paper.The method converges asymptotically and probabilistically to global optimization.The circuit net list is partitioned into two partitions such that the number of interconnections between the partitions is minimized.The proposed method begins with an innovative clustering technique to obtain a good initial solution.Results obtained show the versatility of the proposed method in solving non polynomial hard problems of circuit net list partitioning and show an improvement over those available in literature.展开更多
Modular inversion is one of the key arithmetic operations in public key cryptosystems, so low-cost, high-speed hardware implementation is absolutely necessary. This paper presents an algorithm for prime fields for ha...Modular inversion is one of the key arithmetic operations in public key cryptosystems, so low-cost, high-speed hardware implementation is absolutely necessary. This paper presents an algorithm for prime fields for hardware implementation. The algorithm involves only ordinary addition/subtraction and does not need any modular operations, multiplications or divisions. All of the arithmetic operations in the algorithm can be accomplished by only one adder, so it is very suitable for fast very large scale integration (VLSI) implementation. The VLSI implementation of the algorithm is also given with good performance and low silicon penalty.展开更多
Considering the self-heating effect, an accurate expression for the global interconnection resistance per unit length in terms of interconnection wire width and spacing is presented. Based on the proposed resistance m...Considering the self-heating effect, an accurate expression for the global interconnection resistance per unit length in terms of interconnection wire width and spacing is presented. Based on the proposed resistance model and according to the trade-off theory, a novel optimization analytical model of delay, power dissipation and bandwidth is derived. The proposed optimal model is verified and compared based on 90 nm, 65 nm and 40 nm CMOS technologies. It can be found that more optimum results can be easily obtained by the proposed model. This optimization model is more accurate and realistic than the conventional optimization models, and can be integrated into the global interconnection design ofnano-scale integrated circuits.展开更多
基金Supported by the Guangdong Provincial Natural Science Foundation of China(2014A030313441)the Guangzhou Science and Technology Project(201510010169)+1 种基金the Guangdong Province Science and Technology Project(2016B090918071,2014A040401076)the National Natural Science Foundation of China(61072028)
文摘The interconnect temperature of very large scale integration(VLSI) circuits keeps rising due to self-heating and substrate temperature, which can increase the delay and power dissipation of interconnect wires. The thermal vias are regarded as a promising method to improve the temperature performance of VLSI circuits. In this paper, the extra thermal vias were used to decrease the delay and power dissipation of interconnect wires of VLSI circuits. Two analytical models were presented for interconnect temperature, delay and power dissipation with adding extra dummy thermal vias. The influence of the number of thermal vias on the delay and power dissipation of interconnect wires was analyzed and the optimal via separation distance was investigated. The experimental results show that the adding extra dummy thermal vias can reduce the interconnect average temperature, maximum temperature, delay and power dissipation. Moreover, this method is also suitable for clock signal wires with a large root mean square current.
文摘Guest Editor in Chief:Professor Xiao-Ping Zhang, University of Birmingham The potential for renewable energy to make contributions to mitigating the impact of climate change is expected to increase significantly in the longer term. Renewable energy generation technologies including onshore wind, offshore wind, wave,tidal, marine current, and ocean thermal energy generation as well as PV power generation, which are considered
基金The National Science and Technology M ajor Project of the M inistry of Science and Technology of China(No.2014ZX03003007-009)
文摘To achieve high parallel computation of discrete wavelet transform (DWT) in JPEG2000, a high-throughput two-dimensional (2D) 9/7 DWT very large scale integration (VLSI) design is proposed, in which the row processor is based on flipping structure. Due to the difference of the input data flow, the column processor is obtained by adding the input selector and data buffer to the row processor. Normalization steps in row and column DWT are combined to reduce the number of multipliers, and the rationality is verified. By rearranging the output of four-line row DWT with a multiplexer (MUX), the amount of data processed by each column processor becomes half, and the four-input/four- output architecture is implemented. For an image with the size of N x N, the computing time of one-level 2D 9/7 DWT is 0.25N2 + 1.5N clock cycles. The critical path delay is one multiplier delay, and only 5N internal memory is required. The results of post-route simulation on FPGA show that clock frequency reaches 136 MHz, and the throughput is 544 Msample/s, which satisfies the requirements of high-speed applications.
文摘Low power and real time very large scale integration (VLSI) architectures of motion estimation (ME) algorithms for mobile devices and applications are presented. The power reduction is achieved by devising a novel correction recovery mechanism based on algorithms which allow the use of reduced bit sum of absolute difference (RBSAD) metric for calculating matching error and conversion to full resolution sum of absolute difference (SAD) metric whenever necessary. Parallel and pipelined architectures for high throughput of full search ME corresponding to both the full resolution SAD and the generalized RBSAD algorithm are synthe- sized using Xilinx Synthesis Tools (XST), where the ME designs based on reduced bit (RB) algorithms demonstrate the reduction in power consumption up to 45% and/or the reduction in area up to 38%.
基金Supported by the National 863 project (No.2002AA133010).
文摘A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the critical path latency of computation, and to reduce the complexity of hardware implementation as well. The detailed derivation on the proposed algorithm, as well as the resulting Very Large Scale Integration (VLSI) architecture, is introduced, taking the 9/7 DWT as an example but without loss of generality. In comparison with the Conventional Lifting Algorithm Based Implementation (CLABI), the critical path latency of the proposed architecture is reduced by more than half from (4Tm + 8Ta)to Tm + 4Ta, and is competitive to that of Convolution-Based Implementation (CBI), but the new implementation will save significantly in hardware. The experimental results demonstrate that the proposed architecture has good performance in both increasing working frequency and reducing area.
文摘The design of space-efficient support hardware for built-in self-testing is of great significance in very large scale integration circuits and systems, particularly in view of the paradigm shift in recent times from system-on-board to system-on-chip technology. The subject paper proposes a new approach to designing aliasing-free or zero-aliasing space compaction hardware targeting specifically embedded cores-based system-on-chips for single stuck-line faults extending well-known concept from conventional switching theory, viz. that of compatibility relation as used in the minimization of incomplete sequential machines. For a pair of response outputs of the circuit under test, the method introduces the notion of fault detection compatibility and conditional fault detection compatibility (conditional upon some other response output pair being simultaneously fault detection compatible) with respect to two-input XOR/XNOR logic. The process is illustrated with design details of space compressors for the International Symposium on Circuits and Systems or ISCAS 85 combinational and ISCAS 89 full-scan sequential benchmark circuits using simulation programs ATALANTA and FSIM, attesting to the usefulness of the technique for its relative simplicity, resultant low area overhead and full fault coverage for single stuck-line faults, thus making it suitable in commercial design environments.
文摘The rapid development in the digital circuit design enhances the applications on very large scale integration era. Encoders are one among the digital circuits found in all communication systems. The polar encoding is mainly meant for its channel achieving property. It finds its application in communications, sensing and information theory. This coding proposed by Erdal Arikan is significant because of its zero error floors and simple architecture for hardware implementation. In this paper, a folded polar encoder is designed to start from the fully parallel architecture and proceeds with its data flow graph, delay requirement calculation, lifetime analysis and register allocation, which results in a very large scale integration architecture with minimum hardware utilization. The results are simulated for 4 and 8 parallel folded 32-bit polar encoder using Xilinx 14.6 ISIM and implemented in Virtex 5 field programmable gate array. A comparison is made on fully parallel and various folding techniques based on their resource utilization.
文摘Reversible logic is a new emerging technology with many promising applications in optical information processing, low power (Complementary Metal Oxide Semiconductor) CMOS design, (De Oxy RiboNucleic Acid) DNA computing, etc. In industrial automation, comparators play an important role in segregating faulty patterns from good ones. In previous works, these comparators have been implemented with more number of reversible gates and computational complexity. All these comparators use propagation technique to compare the data. This will reduce the efficiency of the comparators. To overcome the problem, this paper proposes an efficient comparator using (Thapliyal Ranganathan) TR gate utilizing full subtraction and half subtraction algorithm which will improve the computation efficiency. The comparator design using half subtraction algorithm shows an improvement in terms of quantum cost. The comparator design using full subtraction algorithm shows effectiveness in reducing number of reversible gates required and garbage output.
文摘An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields. Several novel design techniques for floating-point adder and multiplier are introduced in detail to enhance the speed of the system. At the same time, the power consumption is decreased. The hardware area is effectively reduced as an improved butterfly processor is developed. There is a substantial increase in the performance of the design since a pipelined architecture is adopted, and very large scale integrated (VLSI) is easy to realize due to the regularity. A result of validation using field programmable gate array (FPGA) is shown at the end. When the system clock is set to 50 MHz, 204.8 μs is needed to complete the operation of FFT computation.
文摘The state-of-the-art multi-core computer systems are based on Very Large Scale three Dimensional (3D) Integrated circuits (VLSI). In order to provide high-speed vertical data transmission in such 3D systems, efficient Through-Silicon Via (TSV) technology is critically important. In this paper, various Radio Frequency (RF) TSV designs and models are proposed. Specifically, the Cu-plug TSV with surrounding ground TSVs is used as the baseline structure. For further improvement, the dielectric coaxial and novel air-gap coaxial TSVs are introduced. Using the empirical parameters of these coaxial TSVs, the simulation results are obtained demonstrating that these coaxial RF-TSVs can provide two-order higher of cut-off frequencies than the Cu-plug TSVs. Based on these new RF-TSV technologies, we propose a novel 3D multi-core computer system as well as new architectures for manipulating the interfaces between RF and baseband circuit. Taking into consideration the scaling down of IC manufacture technologies, predictions for the performance of future generations of circuits are made. With simulation results indicating energy per bit and area per bit being reduced by 7% and 11% respectively, we can conclude that the proposed method is a worthwhile guideline for the design of future multi-core computer ICs.
文摘In order to develop the core chip supporting binocular stereo displays for head mounted display (HMD) and glasses-TV, a very large scale integrated (VISI) design scheme is proposed by using a pipeline architecture for 3D display processing chip (HMD100). Some key techniques including stereo display processing and high precision video scaling based bicubic interpolation, and their hardware implementations which improve the image quality are presented. The proposed HMD100 chip is verified by the field-programmable gate array (FPGA). As one of innovative and high integration SoC chips, HMD100 is designed by a digital and analog mixed circuit. It can support binocular stereo display, has better scaling effect and integration. Hence it is applicable in virtual reality (VR), 3D games and other microdisplay domains.
文摘This paper proposes a low-power MOS current mode logic (MCML) circuit with sleep-transistor to reduce the leakage current. The sleep-transistor is used to high-threshold voltage transistor to minimize the leakage current. The 16× 16 bit parallel multiplier is designed with the proposed technology. Comparing with the previous MCML circuit, the circuit achieves the reduction of the power consumption in sleep mode by 1/258. This circuit is designed with Samsung 0.35 um complementary metal oxide semiconductor (CMOS) process. The validity and effectiveness are verified through the HSPICE simulation.
文摘Triple-threshold CMOS technique provides the transistors that have low-, normal-, and high-threshold voltage. This paper describes a low-power carry look-ahead adder with triple-threshold CMOS technique. While the low-threshold voltage transistors are used to reduce the propagation delay time in the critical path, the high-threshold voltage transistors are used to reduce the power consumption in the shortest path. Comparing with the conventional CMOS circuit, the circuit is achieved to reduce the power consumption by 14.71% and the power-delay-product by 16.11%. This circuit is designed with Samsung 0.35 um CMOS process. The validity and effectiveness are verified through the HSPICE simulation.
文摘The influence of an electric field on metallic single walled carbon nanotube (SWCNT) interconnects is studied. A voltage-dependent equivalent circuit model is presented for the impedance parameters of single-wall carbon nanotubes that capture various electron-phonon scattering mechanisms as a function of the electric field. To estimate the performance of SWCNT bundle interconnects, signal delay and power dissipation are calculated based on the field dependent model that results in an improvement in the delay and power estimation accuracy compared to the field-independent model. We find that the power delay product of a SWCNT bundle increases with the increase in electric field but decreases with technology scaling showing that at a low electric field, the SWCNT bundle is a potential reliable alternative interconnect for future high performance VLSI industry at scaled technologies.
基金the National Natural Science Foundation of China (Nos. 60633060 and 60576031)the National Basic Research and Development (973) Program of China (No. 2005CB321604)
文摘Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test application time. However, CSTP cannot reliably attain high fault coverage because of difficulty of testing random-pattern-resistant faults. This paper presents a deterministic CSTP (DCSTP) structure that consists of a DCSTP chain and jumping logic, to attain high fault coverage with low area overhead. Experimental re- sults on ISCAS’89 benchmarks show that 100% fault coverage can be obtained with low area overhead and CPU time, especially for large circuits.
基金Supported by the National Natural Science Foundationof China (No.6963 60 3 0)
文摘A novel pulse stream neuron circuit is presented whose output pulse width facilitates sigmoid activation to activate the function of neurons. The wide symmetrical dynamic range of this neuron ensures high noise immunity. The pulsed activation strategy provides a power efficient architecture, so the circuit has very low power dissipation. The simplicity of the circuit ensures its suitability for large-scale integration.
基金the National Natural Science Foundation of China (No.60476014)
文摘With technology scaling into nanometer regime, rampant process variations impact visible influences on leakage power estimation of very large scale integrations (VLSIs). In order to deal with the case of large inter- and intra-die variations, we induce a novel theory prototype of the statistical leakage power analysis (SLPA) for function blocks. Because inter-die variations can be pinned down into a small range but the number of gates in function blocks is large(>1000), we continue to simplify the prototype. At last, we induce the efficient methodology of SLPA. The method can save much running time for SLPA in the low power design since it is of the local-updating advantage. A large number of experimental data show that the method only takes feasible running time (0.32 s) to obtain accurate results (3 σ-error <0.5% on maximum) as function block circuits simultaneous suffer from 7.5%(3 σ/mean) inter-die and 7.5% intra-die length variations, which demonstrates that our method is suitable for statistical leakage power analysis of VLSIs under rampant process variations.
文摘Circuit net list bipartitioning using simulated annealing technique has been proposed in the paper.The method converges asymptotically and probabilistically to global optimization.The circuit net list is partitioned into two partitions such that the number of interconnections between the partitions is minimized.The proposed method begins with an innovative clustering technique to obtain a good initial solution.Results obtained show the versatility of the proposed method in solving non polynomial hard problems of circuit net list partitioning and show an improvement over those available in literature.
基金Supported by the Prom otion Plan of the Ministry of E-ducation and the National Natural Science Foundationof China(No.2 0 0 2 AA14 10 4 0 )
文摘Modular inversion is one of the key arithmetic operations in public key cryptosystems, so low-cost, high-speed hardware implementation is absolutely necessary. This paper presents an algorithm for prime fields for hardware implementation. The algorithm involves only ordinary addition/subtraction and does not need any modular operations, multiplications or divisions. All of the arithmetic operations in the algorithm can be accomplished by only one adder, so it is very suitable for fast very large scale integration (VLSI) implementation. The VLSI implementation of the algorithm is also given with good performance and low silicon penalty.
基金supported by the National Natural Science Foundation of China(No.60606006)the Key Science&Technology Special Project of Shaanxi Province,China(No.2011KTCQ01-19)the National Defense Pre-Research Foundation of China(No.9140A23060111)
文摘Considering the self-heating effect, an accurate expression for the global interconnection resistance per unit length in terms of interconnection wire width and spacing is presented. Based on the proposed resistance model and according to the trade-off theory, a novel optimization analytical model of delay, power dissipation and bandwidth is derived. The proposed optimal model is verified and compared based on 90 nm, 65 nm and 40 nm CMOS technologies. It can be found that more optimum results can be easily obtained by the proposed model. This optimization model is more accurate and realistic than the conventional optimization models, and can be integrated into the global interconnection design ofnano-scale integrated circuits.