Some energy experts believe that solar energy photovoltaic power generation is hopeful to be applied in a large amount and possesses a certain proportion in the structure of energy in the future. In this paper, based ...Some energy experts believe that solar energy photovoltaic power generation is hopeful to be applied in a large amount and possesses a certain proportion in the structure of energy in the future. In this paper, based on the forecasting of electric load demand and energy structure of power generation in the middle of 21 century, the pictures of VLS-PV power genera- tion is composed, the operation characteristic of VLS-PV power generation and the adaptability of electric power grid for it is analyzed, the ways for transmitting large amount of PV power and the economic and technical bottlenecks for applying VLS-PV power generation are discussed. Finally, the steps and suggestions for developing VLS-PV power generation and its electric power system in China are proposed.展开更多
A very large scale wind turbine can be made as a circular large scale stator frame;the frame,which can reach some kilometers in diameter and some hundred meters in height,contains many circular sail trains.The stator ...A very large scale wind turbine can be made as a circular large scale stator frame;the frame,which can reach some kilometers in diameter and some hundred meters in height,contains many circular sail trains.The stator frame can be made using a light-weight tubular design.Wind can almost freely blow through this frame.Train rails are fixed at the outer surface of the frame as horizontal rings.The distance between the rails of one ring can be made to be several meters.As a result,the number of the rings can be ten or more.Each rail ring supports one sail train that is moved with wind power around the frame.The energy of this movement is transformed to electric power and is transmitted to the base of the frame.This design can be realized in a very large scale,which is difficult to achieve using a traditional three-blade turbine.展开更多
Based on the real-time synchronous measurements of the wind velocity,temperature,the PM10 concentration at 16 m and 47 m during a dust storm event,in which Reynolds number Re exceeds 6×106,this study reveals the ...Based on the real-time synchronous measurements of the wind velocity,temperature,the PM10 concentration at 16 m and 47 m during a dust storm event,in which Reynolds number Re exceeds 6×106,this study reveals the existence of the very large scale motions(VLSMs) during the stable stage both in the stream velocity and the temperature field at the two heights,whose streamwise scales reach up to 10 times the thickness of the boundary layer.The streamwise velocity and the PM10 concentration display a similar frequency corresponding to the peaks of their energy spectra,which implies that the VLSMs of streamwise flow have a significant role in dust transportation.In contrast,the salient deviations of the PM10 concentration at 47 m from the Gaussian distribution are revealed,which means that 47 m is not in the dust transportation layer,but is a region where the dust transportation layer and the outer flow intersect each other.Analysis demonstrates that the energy spectra of the PM10 concentrations at 16 m and 47 m display the "-1" scaling law feature,which has the same frequency range(0.001-0.1 Hz) as that of the wind velocity.This provides a new paradigm for the existence of the self-similarity scaling region in turbulent flow.展开更多
The interconnect temperature of very large scale integration(VLSI) circuits keeps rising due to self-heating and substrate temperature, which can increase the delay and power dissipation of interconnect wires. The t...The interconnect temperature of very large scale integration(VLSI) circuits keeps rising due to self-heating and substrate temperature, which can increase the delay and power dissipation of interconnect wires. The thermal vias are regarded as a promising method to improve the temperature performance of VLSI circuits. In this paper, the extra thermal vias were used to decrease the delay and power dissipation of interconnect wires of VLSI circuits. Two analytical models were presented for interconnect temperature, delay and power dissipation with adding extra dummy thermal vias. The influence of the number of thermal vias on the delay and power dissipation of interconnect wires was analyzed and the optimal via separation distance was investigated. The experimental results show that the adding extra dummy thermal vias can reduce the interconnect average temperature, maximum temperature, delay and power dissipation. Moreover, this method is also suitable for clock signal wires with a large root mean square current.展开更多
To achieve high parallel computation of discrete wavelet transform (DWT) in JPEG2000, a high-throughput two-dimensional (2D) 9/7 DWT very large scale integration (VLSI) design is proposed, in which the row proce...To achieve high parallel computation of discrete wavelet transform (DWT) in JPEG2000, a high-throughput two-dimensional (2D) 9/7 DWT very large scale integration (VLSI) design is proposed, in which the row processor is based on flipping structure. Due to the difference of the input data flow, the column processor is obtained by adding the input selector and data buffer to the row processor. Normalization steps in row and column DWT are combined to reduce the number of multipliers, and the rationality is verified. By rearranging the output of four-line row DWT with a multiplexer (MUX), the amount of data processed by each column processor becomes half, and the four-input/four- output architecture is implemented. For an image with the size of N x N, the computing time of one-level 2D 9/7 DWT is 0.25N2 + 1.5N clock cycles. The critical path delay is one multiplier delay, and only 5N internal memory is required. The results of post-route simulation on FPGA show that clock frequency reaches 136 MHz, and the throughput is 544 Msample/s, which satisfies the requirements of high-speed applications.展开更多
Tomographic particle image velocimetry was used to quantitatively visualize the three-dimensional co- herent structures in the logarithmic region of the turbulent boundary layer in a water tunnel. The Reynolds number ...Tomographic particle image velocimetry was used to quantitatively visualize the three-dimensional co- herent structures in the logarithmic region of the turbulent boundary layer in a water tunnel. The Reynolds number based on momentum thickness is Reo = 2 460. The in- stantaneous velocity fields give evidence of hairpin vortices aligned in the streamwise direction forming very long zones of low speed fluid, which is flanked on either side by high- speed ones. Statistical support for the existence of hairpins is given by conditional averaged eddy within an increasing spanwise width as the distance from the wall increases, and the main vortex characteristic in different wall-normal re- gions can be reflected by comparing the proportion of ejec- tion and its contribution to Reynolds stress with that of sweep event. The pre-multiplied power spectra and two-point cor- relations indicate the presence of large-scale motions in the boundary layer, which are consistent with what have been termed very large scale motions (VLSMs). The three dimen-sional spatial correlations of three components of veloc- ity further indicate that the elongated low-speed and high- speed regions will be accompanied by a counter-rotating roll modes, as the statistical imprint of hairpin packet structures, all of which together make up the characteristic of coherent structures in the logarithmic region of the turbulent boundary layer (TBL).展开更多
An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields...An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields. Several novel design techniques for floating-point adder and multiplier are introduced in detail to enhance the speed of the system. At the same time, the power consumption is decreased. The hardware area is effectively reduced as an improved butterfly processor is developed. There is a substantial increase in the performance of the design since a pipelined architecture is adopted, and very large scale integrated (VLSI) is easy to realize due to the regularity. A result of validation using field programmable gate array (FPGA) is shown at the end. When the system clock is set to 50 MHz, 204.8 μs is needed to complete the operation of FFT computation.展开更多
Low power and real time very large scale integration (VLSI) architectures of motion estimation (ME) algorithms for mobile devices and applications are presented. The power reduction is achieved by devising a novel...Low power and real time very large scale integration (VLSI) architectures of motion estimation (ME) algorithms for mobile devices and applications are presented. The power reduction is achieved by devising a novel correction recovery mechanism based on algorithms which allow the use of reduced bit sum of absolute difference (RBSAD) metric for calculating matching error and conversion to full resolution sum of absolute difference (SAD) metric whenever necessary. Parallel and pipelined architectures for high throughput of full search ME corresponding to both the full resolution SAD and the generalized RBSAD algorithm are synthe- sized using Xilinx Synthesis Tools (XST), where the ME designs based on reduced bit (RB) algorithms demonstrate the reduction in power consumption up to 45% and/or the reduction in area up to 38%.展开更多
The state-of-the-art multi-core computer systems are based on Very Large Scale three Dimensional (3D) Integrated circuits (VLSI). In order to provide high-speed vertical data transmission in such 3D systems, efficient...The state-of-the-art multi-core computer systems are based on Very Large Scale three Dimensional (3D) Integrated circuits (VLSI). In order to provide high-speed vertical data transmission in such 3D systems, efficient Through-Silicon Via (TSV) technology is critically important. In this paper, various Radio Frequency (RF) TSV designs and models are proposed. Specifically, the Cu-plug TSV with surrounding ground TSVs is used as the baseline structure. For further improvement, the dielectric coaxial and novel air-gap coaxial TSVs are introduced. Using the empirical parameters of these coaxial TSVs, the simulation results are obtained demonstrating that these coaxial RF-TSVs can provide two-order higher of cut-off frequencies than the Cu-plug TSVs. Based on these new RF-TSV technologies, we propose a novel 3D multi-core computer system as well as new architectures for manipulating the interfaces between RF and baseband circuit. Taking into consideration the scaling down of IC manufacture technologies, predictions for the performance of future generations of circuits are made. With simulation results indicating energy per bit and area per bit being reduced by 7% and 11% respectively, we can conclude that the proposed method is a worthwhile guideline for the design of future multi-core computer ICs.展开更多
A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the cri...A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the critical path latency of computation, and to reduce the complexity of hardware implementation as well. The detailed derivation on the proposed algorithm, as well as the resulting Very Large Scale Integration (VLSI) architecture, is introduced, taking the 9/7 DWT as an example but without loss of generality. In comparison with the Conventional Lifting Algorithm Based Implementation (CLABI), the critical path latency of the proposed architecture is reduced by more than half from (4Tm + 8Ta)to Tm + 4Ta, and is competitive to that of Convolution-Based Implementation (CBI), but the new implementation will save significantly in hardware. The experimental results demonstrate that the proposed architecture has good performance in both increasing working frequency and reducing area.展开更多
In order to develop the core chip supporting binocular stereo displays for head mounted display (HMD) and glasses-TV, a very large scale integrated (VISI) design scheme is proposed by using a pipeline architecture...In order to develop the core chip supporting binocular stereo displays for head mounted display (HMD) and glasses-TV, a very large scale integrated (VISI) design scheme is proposed by using a pipeline architecture for 3D display processing chip (HMD100). Some key techniques including stereo display processing and high precision video scaling based bicubic interpolation, and their hardware implementations which improve the image quality are presented. The proposed HMD100 chip is verified by the field-programmable gate array (FPGA). As one of innovative and high integration SoC chips, HMD100 is designed by a digital and analog mixed circuit. It can support binocular stereo display, has better scaling effect and integration. Hence it is applicable in virtual reality (VR), 3D games and other microdisplay domains.展开更多
The design of space-efficient support hardware for built-in self-testing is of great significance in very large scale integration circuits and systems, particularly in view of the paradigm shift in recent times from s...The design of space-efficient support hardware for built-in self-testing is of great significance in very large scale integration circuits and systems, particularly in view of the paradigm shift in recent times from system-on-board to system-on-chip technology. The subject paper proposes a new approach to designing aliasing-free or zero-aliasing space compaction hardware targeting specifically embedded cores-based system-on-chips for single stuck-line faults extending well-known concept from conventional switching theory, viz. that of compatibility relation as used in the minimization of incomplete sequential machines. For a pair of response outputs of the circuit under test, the method introduces the notion of fault detection compatibility and conditional fault detection compatibility (conditional upon some other response output pair being simultaneously fault detection compatible) with respect to two-input XOR/XNOR logic. The process is illustrated with design details of space compressors for the International Symposium on Circuits and Systems or ISCAS 85 combinational and ISCAS 89 full-scan sequential benchmark circuits using simulation programs ATALANTA and FSIM, attesting to the usefulness of the technique for its relative simplicity, resultant low area overhead and full fault coverage for single stuck-line faults, thus making it suitable in commercial design environments.展开更多
This paper proposes a low-power MOS current mode logic (MCML) circuit with sleep-transistor to reduce the leakage current. The sleep-transistor is used to high-threshold voltage transistor to minimize the leakage cu...This paper proposes a low-power MOS current mode logic (MCML) circuit with sleep-transistor to reduce the leakage current. The sleep-transistor is used to high-threshold voltage transistor to minimize the leakage current. The 16× 16 bit parallel multiplier is designed with the proposed technology. Comparing with the previous MCML circuit, the circuit achieves the reduction of the power consumption in sleep mode by 1/258. This circuit is designed with Samsung 0.35 um complementary metal oxide semiconductor (CMOS) process. The validity and effectiveness are verified through the HSPICE simulation.展开更多
Triple-threshold CMOS technique provides the transistors that have low-, normal-, and high-threshold voltage. This paper describes a low-power carry look-ahead adder with triple-threshold CMOS technique. While the low...Triple-threshold CMOS technique provides the transistors that have low-, normal-, and high-threshold voltage. This paper describes a low-power carry look-ahead adder with triple-threshold CMOS technique. While the low-threshold voltage transistors are used to reduce the propagation delay time in the critical path, the high-threshold voltage transistors are used to reduce the power consumption in the shortest path. Comparing with the conventional CMOS circuit, the circuit is achieved to reduce the power consumption by 14.71% and the power-delay-product by 16.11%. This circuit is designed with Samsung 0.35 um CMOS process. The validity and effectiveness are verified through the HSPICE simulation.展开更多
The rapid development in the digital circuit design enhances the applications on very large scale integration era. Encoders are one among the digital circuits found in all communication systems. The polar encoding is ...The rapid development in the digital circuit design enhances the applications on very large scale integration era. Encoders are one among the digital circuits found in all communication systems. The polar encoding is mainly meant for its channel achieving property. It finds its application in communications, sensing and information theory. This coding proposed by Erdal Arikan is significant because of its zero error floors and simple architecture for hardware implementation. In this paper, a folded polar encoder is designed to start from the fully parallel architecture and proceeds with its data flow graph, delay requirement calculation, lifetime analysis and register allocation, which results in a very large scale integration architecture with minimum hardware utilization. The results are simulated for 4 and 8 parallel folded 32-bit polar encoder using Xilinx 14.6 ISIM and implemented in Virtex 5 field programmable gate array. A comparison is made on fully parallel and various folding techniques based on their resource utilization.展开更多
Reversible logic is a new emerging technology with many promising applications in optical information processing, low power (Complementary Metal Oxide Semiconductor) CMOS design, (De Oxy RiboNucleic Acid) DNA computin...Reversible logic is a new emerging technology with many promising applications in optical information processing, low power (Complementary Metal Oxide Semiconductor) CMOS design, (De Oxy RiboNucleic Acid) DNA computing, etc. In industrial automation, comparators play an important role in segregating faulty patterns from good ones. In previous works, these comparators have been implemented with more number of reversible gates and computational complexity. All these comparators use propagation technique to compare the data. This will reduce the efficiency of the comparators. To overcome the problem, this paper proposes an efficient comparator using (Thapliyal Ranganathan) TR gate utilizing full subtraction and half subtraction algorithm which will improve the computation efficiency. The comparator design using half subtraction algorithm shows an improvement in terms of quantum cost. The comparator design using full subtraction algorithm shows effectiveness in reducing number of reversible gates required and garbage output.展开更多
The influence of an electric field on metallic single walled carbon nanotube (SWCNT) interconnects is studied. A voltage-dependent equivalent circuit model is presented for the impedance parameters of single-wall ca...The influence of an electric field on metallic single walled carbon nanotube (SWCNT) interconnects is studied. A voltage-dependent equivalent circuit model is presented for the impedance parameters of single-wall carbon nanotubes that capture various electron-phonon scattering mechanisms as a function of the electric field. To estimate the performance of SWCNT bundle interconnects, signal delay and power dissipation are calculated based on the field dependent model that results in an improvement in the delay and power estimation accuracy compared to the field-independent model. We find that the power delay product of a SWCNT bundle increases with the increase in electric field but decreases with technology scaling showing that at a low electric field, the SWCNT bundle is a potential reliable alternative interconnect for future high performance VLSI industry at scaled technologies.展开更多
Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test applic...Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test application time. However, CSTP cannot reliably attain high fault coverage because of difficulty of testing random-pattern-resistant faults. This paper presents a deterministic CSTP (DCSTP) structure that consists of a DCSTP chain and jumping logic, to attain high fault coverage with low area overhead. Experimental re- sults on ISCAS’89 benchmarks show that 100% fault coverage can be obtained with low area overhead and CPU time, especially for large circuits.展开更多
A novel pulse stream neuron circuit is presented whose output pulse width facilitates sigmoid activation to activate the function of neurons. The wide symmetrical dynamic range of this neuron ensures high noise immuni...A novel pulse stream neuron circuit is presented whose output pulse width facilitates sigmoid activation to activate the function of neurons. The wide symmetrical dynamic range of this neuron ensures high noise immunity. The pulsed activation strategy provides a power efficient architecture, so the circuit has very low power dissipation. The simplicity of the circuit ensures its suitability for large-scale integration.展开更多
With technology scaling into nanometer regime, rampant process variations impact visible influences on leakage power estimation of very large scale integrations (VLSIs). In order to deal with the case of large inter- ...With technology scaling into nanometer regime, rampant process variations impact visible influences on leakage power estimation of very large scale integrations (VLSIs). In order to deal with the case of large inter- and intra-die variations, we induce a novel theory prototype of the statistical leakage power analysis (SLPA) for function blocks. Because inter-die variations can be pinned down into a small range but the number of gates in function blocks is large(>1000), we continue to simplify the prototype. At last, we induce the efficient methodology of SLPA. The method can save much running time for SLPA in the low power design since it is of the local-updating advantage. A large number of experimental data show that the method only takes feasible running time (0.32 s) to obtain accurate results (3 σ-error <0.5% on maximum) as function block circuits simultaneous suffer from 7.5%(3 σ/mean) inter-die and 7.5% intra-die length variations, which demonstrates that our method is suitable for statistical leakage power analysis of VLSIs under rampant process variations.展开更多
文摘Some energy experts believe that solar energy photovoltaic power generation is hopeful to be applied in a large amount and possesses a certain proportion in the structure of energy in the future. In this paper, based on the forecasting of electric load demand and energy structure of power generation in the middle of 21 century, the pictures of VLS-PV power genera- tion is composed, the operation characteristic of VLS-PV power generation and the adaptability of electric power grid for it is analyzed, the ways for transmitting large amount of PV power and the economic and technical bottlenecks for applying VLS-PV power generation are discussed. Finally, the steps and suggestions for developing VLS-PV power generation and its electric power system in China are proposed.
文摘A very large scale wind turbine can be made as a circular large scale stator frame;the frame,which can reach some kilometers in diameter and some hundred meters in height,contains many circular sail trains.The stator frame can be made using a light-weight tubular design.Wind can almost freely blow through this frame.Train rails are fixed at the outer surface of the frame as horizontal rings.The distance between the rails of one ring can be made to be several meters.As a result,the number of the rings can be ten or more.Each rail ring supports one sail train that is moved with wind power around the frame.The energy of this movement is transformed to electric power and is transmitted to the base of the frame.This design can be realized in a very large scale,which is difficult to achieve using a traditional three-blade turbine.
基金supported by the National Natural Science Foundation of China (Grant Nos. 11232006,11121202,10972164,40830103,and 11072097)the State Key Dvelopment Program for Basic Research of China (Grant No. 2009CB421304)
文摘Based on the real-time synchronous measurements of the wind velocity,temperature,the PM10 concentration at 16 m and 47 m during a dust storm event,in which Reynolds number Re exceeds 6×106,this study reveals the existence of the very large scale motions(VLSMs) during the stable stage both in the stream velocity and the temperature field at the two heights,whose streamwise scales reach up to 10 times the thickness of the boundary layer.The streamwise velocity and the PM10 concentration display a similar frequency corresponding to the peaks of their energy spectra,which implies that the VLSMs of streamwise flow have a significant role in dust transportation.In contrast,the salient deviations of the PM10 concentration at 47 m from the Gaussian distribution are revealed,which means that 47 m is not in the dust transportation layer,but is a region where the dust transportation layer and the outer flow intersect each other.Analysis demonstrates that the energy spectra of the PM10 concentrations at 16 m and 47 m display the "-1" scaling law feature,which has the same frequency range(0.001-0.1 Hz) as that of the wind velocity.This provides a new paradigm for the existence of the self-similarity scaling region in turbulent flow.
基金Supported by the Guangdong Provincial Natural Science Foundation of China(2014A030313441)the Guangzhou Science and Technology Project(201510010169)+1 种基金the Guangdong Province Science and Technology Project(2016B090918071,2014A040401076)the National Natural Science Foundation of China(61072028)
文摘The interconnect temperature of very large scale integration(VLSI) circuits keeps rising due to self-heating and substrate temperature, which can increase the delay and power dissipation of interconnect wires. The thermal vias are regarded as a promising method to improve the temperature performance of VLSI circuits. In this paper, the extra thermal vias were used to decrease the delay and power dissipation of interconnect wires of VLSI circuits. Two analytical models were presented for interconnect temperature, delay and power dissipation with adding extra dummy thermal vias. The influence of the number of thermal vias on the delay and power dissipation of interconnect wires was analyzed and the optimal via separation distance was investigated. The experimental results show that the adding extra dummy thermal vias can reduce the interconnect average temperature, maximum temperature, delay and power dissipation. Moreover, this method is also suitable for clock signal wires with a large root mean square current.
基金The National Science and Technology M ajor Project of the M inistry of Science and Technology of China(No.2014ZX03003007-009)
文摘To achieve high parallel computation of discrete wavelet transform (DWT) in JPEG2000, a high-throughput two-dimensional (2D) 9/7 DWT very large scale integration (VLSI) design is proposed, in which the row processor is based on flipping structure. Due to the difference of the input data flow, the column processor is obtained by adding the input selector and data buffer to the row processor. Normalization steps in row and column DWT are combined to reduce the number of multipliers, and the rationality is verified. By rearranging the output of four-line row DWT with a multiplexer (MUX), the amount of data processed by each column processor becomes half, and the four-input/four- output architecture is implemented. For an image with the size of N x N, the computing time of one-level 2D 9/7 DWT is 0.25N2 + 1.5N clock cycles. The critical path delay is one multiplier delay, and only 5N internal memory is required. The results of post-route simulation on FPGA show that clock frequency reaches 136 MHz, and the throughput is 544 Msample/s, which satisfies the requirements of high-speed applications.
基金supported by the National Natural Science Foundation of China (10832001 and 10872145)the State Key Laboratory of Nonlinear Mechanics,Institute of Mechanics,Chinese Academy of Sciences
文摘Tomographic particle image velocimetry was used to quantitatively visualize the three-dimensional co- herent structures in the logarithmic region of the turbulent boundary layer in a water tunnel. The Reynolds number based on momentum thickness is Reo = 2 460. The in- stantaneous velocity fields give evidence of hairpin vortices aligned in the streamwise direction forming very long zones of low speed fluid, which is flanked on either side by high- speed ones. Statistical support for the existence of hairpins is given by conditional averaged eddy within an increasing spanwise width as the distance from the wall increases, and the main vortex characteristic in different wall-normal re- gions can be reflected by comparing the proportion of ejec- tion and its contribution to Reynolds stress with that of sweep event. The pre-multiplied power spectra and two-point cor- relations indicate the presence of large-scale motions in the boundary layer, which are consistent with what have been termed very large scale motions (VLSMs). The three dimen-sional spatial correlations of three components of veloc- ity further indicate that the elongated low-speed and high- speed regions will be accompanied by a counter-rotating roll modes, as the statistical imprint of hairpin packet structures, all of which together make up the characteristic of coherent structures in the logarithmic region of the turbulent boundary layer (TBL).
文摘An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields. Several novel design techniques for floating-point adder and multiplier are introduced in detail to enhance the speed of the system. At the same time, the power consumption is decreased. The hardware area is effectively reduced as an improved butterfly processor is developed. There is a substantial increase in the performance of the design since a pipelined architecture is adopted, and very large scale integrated (VLSI) is easy to realize due to the regularity. A result of validation using field programmable gate array (FPGA) is shown at the end. When the system clock is set to 50 MHz, 204.8 μs is needed to complete the operation of FFT computation.
文摘Low power and real time very large scale integration (VLSI) architectures of motion estimation (ME) algorithms for mobile devices and applications are presented. The power reduction is achieved by devising a novel correction recovery mechanism based on algorithms which allow the use of reduced bit sum of absolute difference (RBSAD) metric for calculating matching error and conversion to full resolution sum of absolute difference (SAD) metric whenever necessary. Parallel and pipelined architectures for high throughput of full search ME corresponding to both the full resolution SAD and the generalized RBSAD algorithm are synthe- sized using Xilinx Synthesis Tools (XST), where the ME designs based on reduced bit (RB) algorithms demonstrate the reduction in power consumption up to 45% and/or the reduction in area up to 38%.
文摘The state-of-the-art multi-core computer systems are based on Very Large Scale three Dimensional (3D) Integrated circuits (VLSI). In order to provide high-speed vertical data transmission in such 3D systems, efficient Through-Silicon Via (TSV) technology is critically important. In this paper, various Radio Frequency (RF) TSV designs and models are proposed. Specifically, the Cu-plug TSV with surrounding ground TSVs is used as the baseline structure. For further improvement, the dielectric coaxial and novel air-gap coaxial TSVs are introduced. Using the empirical parameters of these coaxial TSVs, the simulation results are obtained demonstrating that these coaxial RF-TSVs can provide two-order higher of cut-off frequencies than the Cu-plug TSVs. Based on these new RF-TSV technologies, we propose a novel 3D multi-core computer system as well as new architectures for manipulating the interfaces between RF and baseband circuit. Taking into consideration the scaling down of IC manufacture technologies, predictions for the performance of future generations of circuits are made. With simulation results indicating energy per bit and area per bit being reduced by 7% and 11% respectively, we can conclude that the proposed method is a worthwhile guideline for the design of future multi-core computer ICs.
基金Supported by the National 863 project (No.2002AA133010).
文摘A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the critical path latency of computation, and to reduce the complexity of hardware implementation as well. The detailed derivation on the proposed algorithm, as well as the resulting Very Large Scale Integration (VLSI) architecture, is introduced, taking the 9/7 DWT as an example but without loss of generality. In comparison with the Conventional Lifting Algorithm Based Implementation (CLABI), the critical path latency of the proposed architecture is reduced by more than half from (4Tm + 8Ta)to Tm + 4Ta, and is competitive to that of Convolution-Based Implementation (CBI), but the new implementation will save significantly in hardware. The experimental results demonstrate that the proposed architecture has good performance in both increasing working frequency and reducing area.
文摘In order to develop the core chip supporting binocular stereo displays for head mounted display (HMD) and glasses-TV, a very large scale integrated (VISI) design scheme is proposed by using a pipeline architecture for 3D display processing chip (HMD100). Some key techniques including stereo display processing and high precision video scaling based bicubic interpolation, and their hardware implementations which improve the image quality are presented. The proposed HMD100 chip is verified by the field-programmable gate array (FPGA). As one of innovative and high integration SoC chips, HMD100 is designed by a digital and analog mixed circuit. It can support binocular stereo display, has better scaling effect and integration. Hence it is applicable in virtual reality (VR), 3D games and other microdisplay domains.
文摘The design of space-efficient support hardware for built-in self-testing is of great significance in very large scale integration circuits and systems, particularly in view of the paradigm shift in recent times from system-on-board to system-on-chip technology. The subject paper proposes a new approach to designing aliasing-free or zero-aliasing space compaction hardware targeting specifically embedded cores-based system-on-chips for single stuck-line faults extending well-known concept from conventional switching theory, viz. that of compatibility relation as used in the minimization of incomplete sequential machines. For a pair of response outputs of the circuit under test, the method introduces the notion of fault detection compatibility and conditional fault detection compatibility (conditional upon some other response output pair being simultaneously fault detection compatible) with respect to two-input XOR/XNOR logic. The process is illustrated with design details of space compressors for the International Symposium on Circuits and Systems or ISCAS 85 combinational and ISCAS 89 full-scan sequential benchmark circuits using simulation programs ATALANTA and FSIM, attesting to the usefulness of the technique for its relative simplicity, resultant low area overhead and full fault coverage for single stuck-line faults, thus making it suitable in commercial design environments.
文摘This paper proposes a low-power MOS current mode logic (MCML) circuit with sleep-transistor to reduce the leakage current. The sleep-transistor is used to high-threshold voltage transistor to minimize the leakage current. The 16× 16 bit parallel multiplier is designed with the proposed technology. Comparing with the previous MCML circuit, the circuit achieves the reduction of the power consumption in sleep mode by 1/258. This circuit is designed with Samsung 0.35 um complementary metal oxide semiconductor (CMOS) process. The validity and effectiveness are verified through the HSPICE simulation.
文摘Triple-threshold CMOS technique provides the transistors that have low-, normal-, and high-threshold voltage. This paper describes a low-power carry look-ahead adder with triple-threshold CMOS technique. While the low-threshold voltage transistors are used to reduce the propagation delay time in the critical path, the high-threshold voltage transistors are used to reduce the power consumption in the shortest path. Comparing with the conventional CMOS circuit, the circuit is achieved to reduce the power consumption by 14.71% and the power-delay-product by 16.11%. This circuit is designed with Samsung 0.35 um CMOS process. The validity and effectiveness are verified through the HSPICE simulation.
文摘The rapid development in the digital circuit design enhances the applications on very large scale integration era. Encoders are one among the digital circuits found in all communication systems. The polar encoding is mainly meant for its channel achieving property. It finds its application in communications, sensing and information theory. This coding proposed by Erdal Arikan is significant because of its zero error floors and simple architecture for hardware implementation. In this paper, a folded polar encoder is designed to start from the fully parallel architecture and proceeds with its data flow graph, delay requirement calculation, lifetime analysis and register allocation, which results in a very large scale integration architecture with minimum hardware utilization. The results are simulated for 4 and 8 parallel folded 32-bit polar encoder using Xilinx 14.6 ISIM and implemented in Virtex 5 field programmable gate array. A comparison is made on fully parallel and various folding techniques based on their resource utilization.
文摘Reversible logic is a new emerging technology with many promising applications in optical information processing, low power (Complementary Metal Oxide Semiconductor) CMOS design, (De Oxy RiboNucleic Acid) DNA computing, etc. In industrial automation, comparators play an important role in segregating faulty patterns from good ones. In previous works, these comparators have been implemented with more number of reversible gates and computational complexity. All these comparators use propagation technique to compare the data. This will reduce the efficiency of the comparators. To overcome the problem, this paper proposes an efficient comparator using (Thapliyal Ranganathan) TR gate utilizing full subtraction and half subtraction algorithm which will improve the computation efficiency. The comparator design using half subtraction algorithm shows an improvement in terms of quantum cost. The comparator design using full subtraction algorithm shows effectiveness in reducing number of reversible gates required and garbage output.
文摘The influence of an electric field on metallic single walled carbon nanotube (SWCNT) interconnects is studied. A voltage-dependent equivalent circuit model is presented for the impedance parameters of single-wall carbon nanotubes that capture various electron-phonon scattering mechanisms as a function of the electric field. To estimate the performance of SWCNT bundle interconnects, signal delay and power dissipation are calculated based on the field dependent model that results in an improvement in the delay and power estimation accuracy compared to the field-independent model. We find that the power delay product of a SWCNT bundle increases with the increase in electric field but decreases with technology scaling showing that at a low electric field, the SWCNT bundle is a potential reliable alternative interconnect for future high performance VLSI industry at scaled technologies.
基金the National Natural Science Foundation of China (Nos. 60633060 and 60576031)the National Basic Research and Development (973) Program of China (No. 2005CB321604)
文摘Circular self test path (CSTP) is an attractive technique for testing digital integrated circuits(IC) in the nanometer era, because it can easily provide at-speed test with small test data volume and short test application time. However, CSTP cannot reliably attain high fault coverage because of difficulty of testing random-pattern-resistant faults. This paper presents a deterministic CSTP (DCSTP) structure that consists of a DCSTP chain and jumping logic, to attain high fault coverage with low area overhead. Experimental re- sults on ISCAS’89 benchmarks show that 100% fault coverage can be obtained with low area overhead and CPU time, especially for large circuits.
基金Supported by the National Natural Science Foundationof China (No.6963 60 3 0)
文摘A novel pulse stream neuron circuit is presented whose output pulse width facilitates sigmoid activation to activate the function of neurons. The wide symmetrical dynamic range of this neuron ensures high noise immunity. The pulsed activation strategy provides a power efficient architecture, so the circuit has very low power dissipation. The simplicity of the circuit ensures its suitability for large-scale integration.
基金the National Natural Science Foundation of China (No.60476014)
文摘With technology scaling into nanometer regime, rampant process variations impact visible influences on leakage power estimation of very large scale integrations (VLSIs). In order to deal with the case of large inter- and intra-die variations, we induce a novel theory prototype of the statistical leakage power analysis (SLPA) for function blocks. Because inter-die variations can be pinned down into a small range but the number of gates in function blocks is large(>1000), we continue to simplify the prototype. At last, we induce the efficient methodology of SLPA. The method can save much running time for SLPA in the low power design since it is of the local-updating advantage. A large number of experimental data show that the method only takes feasible running time (0.32 s) to obtain accurate results (3 σ-error <0.5% on maximum) as function block circuits simultaneous suffer from 7.5%(3 σ/mean) inter-die and 7.5% intra-die length variations, which demonstrates that our method is suitable for statistical leakage power analysis of VLSIs under rampant process variations.