A runtime reconfigurable very-large-scale integration (VLSI) architecture for image and video scaling by arbitrary factors with good antialiasing performance is presented in this paper. Video scal- ing is used in a ...A runtime reconfigurable very-large-scale integration (VLSI) architecture for image and video scaling by arbitrary factors with good antialiasing performance is presented in this paper. Video scal- ing is used in a wide range of applications from broadcast, medical imaging and high-resolution video effects to video surveillance, and video conferencing. Many algorithms have been proposed for these applications, such as piecewise polynomial kernels and windowed sinc kernels. The sum of three shifted versions of a B-spline function, whose weights can be adjusted for different applications, is adopted as the main filter. The proposed algorithm is confirmed to be effective on image scaling ap- plications and also verified by many widely acknowledged image quality measures. The reconfigu- rable hardware architecture constitutes an arbitrary scaler with low resource consumption and high performance targeted for field programmable gate array (FPGA) devices. The scaling factor can be changed on-the-fly, and the filter can also be changed during runtime within a unifying framework.展开更多
Low power and real time very large scale integration (VLSI) architectures of motion estimation (ME) algorithms for mobile devices and applications are presented. The power reduction is achieved by devising a novel...Low power and real time very large scale integration (VLSI) architectures of motion estimation (ME) algorithms for mobile devices and applications are presented. The power reduction is achieved by devising a novel correction recovery mechanism based on algorithms which allow the use of reduced bit sum of absolute difference (RBSAD) metric for calculating matching error and conversion to full resolution sum of absolute difference (SAD) metric whenever necessary. Parallel and pipelined architectures for high throughput of full search ME corresponding to both the full resolution SAD and the generalized RBSAD algorithm are synthe- sized using Xilinx Synthesis Tools (XST), where the ME designs based on reduced bit (RB) algorithms demonstrate the reduction in power consumption up to 45% and/or the reduction in area up to 38%.展开更多
A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the cri...A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the critical path latency of computation, and to reduce the complexity of hardware implementation as well. The detailed derivation on the proposed algorithm, as well as the resulting Very Large Scale Integration (VLSI) architecture, is introduced, taking the 9/7 DWT as an example but without loss of generality. In comparison with the Conventional Lifting Algorithm Based Implementation (CLABI), the critical path latency of the proposed architecture is reduced by more than half from (4Tm + 8Ta)to Tm + 4Ta, and is competitive to that of Convolution-Based Implementation (CBI), but the new implementation will save significantly in hardware. The experimental results demonstrate that the proposed architecture has good performance in both increasing working frequency and reducing area.展开更多
The interconnect temperature of very large scale integration(VLSI) circuits keeps rising due to self-heating and substrate temperature, which can increase the delay and power dissipation of interconnect wires. The t...The interconnect temperature of very large scale integration(VLSI) circuits keeps rising due to self-heating and substrate temperature, which can increase the delay and power dissipation of interconnect wires. The thermal vias are regarded as a promising method to improve the temperature performance of VLSI circuits. In this paper, the extra thermal vias were used to decrease the delay and power dissipation of interconnect wires of VLSI circuits. Two analytical models were presented for interconnect temperature, delay and power dissipation with adding extra dummy thermal vias. The influence of the number of thermal vias on the delay and power dissipation of interconnect wires was analyzed and the optimal via separation distance was investigated. The experimental results show that the adding extra dummy thermal vias can reduce the interconnect average temperature, maximum temperature, delay and power dissipation. Moreover, this method is also suitable for clock signal wires with a large root mean square current.展开更多
为提高Exp-Golomb码的编解码效率,提出了一种基于快速"首位1检测"的Exp-Golomb编解码器硬件实现方法,降低了计算量并节省了硬件资源。该Exp-Golomb编解码器已通过RTL(Register Transfer Level)级仿真和综合,并在FPGA(Field Pr...为提高Exp-Golomb码的编解码效率,提出了一种基于快速"首位1检测"的Exp-Golomb编解码器硬件实现方法,降低了计算量并节省了硬件资源。该Exp-Golomb编解码器已通过RTL(Register Transfer Level)级仿真和综合,并在FPGA(Field Programmable Gate Array)开发平台进行了验证,在133 MHz时钟频率下编解码器的综合门数分别为765门和632门。该编解码器能满足Baseline档次(30帧/s),分辨率为352×288视频序列的实时编解码对质量和速度的要求。展开更多
基金Supported by the National Natural Science Foundation of China(No.60972126)the Joint Funds of the National Natural Science Foundation of China(No.U0935002/L05)+1 种基金the Beijing Municipal Natural Science Foundation(No.4102060)the State Key Program of the National Natural Science of China(No.61032007)
文摘A runtime reconfigurable very-large-scale integration (VLSI) architecture for image and video scaling by arbitrary factors with good antialiasing performance is presented in this paper. Video scal- ing is used in a wide range of applications from broadcast, medical imaging and high-resolution video effects to video surveillance, and video conferencing. Many algorithms have been proposed for these applications, such as piecewise polynomial kernels and windowed sinc kernels. The sum of three shifted versions of a B-spline function, whose weights can be adjusted for different applications, is adopted as the main filter. The proposed algorithm is confirmed to be effective on image scaling ap- plications and also verified by many widely acknowledged image quality measures. The reconfigu- rable hardware architecture constitutes an arbitrary scaler with low resource consumption and high performance targeted for field programmable gate array (FPGA) devices. The scaling factor can be changed on-the-fly, and the filter can also be changed during runtime within a unifying framework.
文摘Low power and real time very large scale integration (VLSI) architectures of motion estimation (ME) algorithms for mobile devices and applications are presented. The power reduction is achieved by devising a novel correction recovery mechanism based on algorithms which allow the use of reduced bit sum of absolute difference (RBSAD) metric for calculating matching error and conversion to full resolution sum of absolute difference (SAD) metric whenever necessary. Parallel and pipelined architectures for high throughput of full search ME corresponding to both the full resolution SAD and the generalized RBSAD algorithm are synthe- sized using Xilinx Synthesis Tools (XST), where the ME designs based on reduced bit (RB) algorithms demonstrate the reduction in power consumption up to 45% and/or the reduction in area up to 38%.
基金Supported by the National 863 project (No.2002AA133010).
文摘A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the critical path latency of computation, and to reduce the complexity of hardware implementation as well. The detailed derivation on the proposed algorithm, as well as the resulting Very Large Scale Integration (VLSI) architecture, is introduced, taking the 9/7 DWT as an example but without loss of generality. In comparison with the Conventional Lifting Algorithm Based Implementation (CLABI), the critical path latency of the proposed architecture is reduced by more than half from (4Tm + 8Ta)to Tm + 4Ta, and is competitive to that of Convolution-Based Implementation (CBI), but the new implementation will save significantly in hardware. The experimental results demonstrate that the proposed architecture has good performance in both increasing working frequency and reducing area.
基金Supported by the Guangdong Provincial Natural Science Foundation of China(2014A030313441)the Guangzhou Science and Technology Project(201510010169)+1 种基金the Guangdong Province Science and Technology Project(2016B090918071,2014A040401076)the National Natural Science Foundation of China(61072028)
文摘The interconnect temperature of very large scale integration(VLSI) circuits keeps rising due to self-heating and substrate temperature, which can increase the delay and power dissipation of interconnect wires. The thermal vias are regarded as a promising method to improve the temperature performance of VLSI circuits. In this paper, the extra thermal vias were used to decrease the delay and power dissipation of interconnect wires of VLSI circuits. Two analytical models were presented for interconnect temperature, delay and power dissipation with adding extra dummy thermal vias. The influence of the number of thermal vias on the delay and power dissipation of interconnect wires was analyzed and the optimal via separation distance was investigated. The experimental results show that the adding extra dummy thermal vias can reduce the interconnect average temperature, maximum temperature, delay and power dissipation. Moreover, this method is also suitable for clock signal wires with a large root mean square current.