This paper proposes an efficient H.264/AVC entropy decoder.It requires no ROM/RAM fabrication process that decreases fabrication cost and increases operation speed.It was achieved by optimizing lookup tables and inter...This paper proposes an efficient H.264/AVC entropy decoder.It requires no ROM/RAM fabrication process that decreases fabrication cost and increases operation speed.It was achieved by optimizing lookup tables and internal buffers,which significantly improves area,speed,and power.The proposed entropy decoder does not exploit embedded processor for bitstream manipulation, which also improves area,speed,and power.Its gate counts and maximum operation frequency are 77515 gates and 175MHz in 0.18um fabrication process,respectively.The proposed entropy decoder needs 2303 cycles in average for one macroblock decoding.It can run at 28MHz to meet the real-time processing requirement for CIF format video decoding on mobile applications.展开更多
This letter proposes a rate control algorithm for H.264 video encoder, which is based on block activity and buffer state. Experimental results indicate that it has an excellent performance by providing much accurate b...This letter proposes a rate control algorithm for H.264 video encoder, which is based on block activity and buffer state. Experimental results indicate that it has an excellent performance by providing much accurate bit rate and better coding efficiency compared with H.264. The computational complexity of the algorithm is reduced by adopting a novel block activity description method using the Sum of Absolute Difference (SAD) of 16× 16 mode, and its robustness is enhanced by introducing a feedback circuit at frame layer.展开更多
An adaptive pipelining scheme for H.264/AVC context-based adaptive binary arithmetic coding(CABAC) decoder for high definition(HD) applications is proposed to solve data hazard problems coming from the data dependenci...An adaptive pipelining scheme for H.264/AVC context-based adaptive binary arithmetic coding(CABAC) decoder for high definition(HD) applications is proposed to solve data hazard problems coming from the data dependencies in CABAC decoding process.An efficiency model of CABAC decoding pipeline is derived according to the analysis of a common pipeline.Based on that,several adaptive strategies are provided.The pipelining scheme with these strategies can be adaptive to different types of syntax elements(SEs) and the pipeline will not stall during decoding process when these strategies are adopted.In addition,the decoder proposed can fully support H.264/AVC high4:2:2 profile and the experimental results show that the efficiency of decoder is much higher than other architectures with one engine.Taking both performance and cost into consideration,our design makes a good tradeoff compared with other work and it is sufficient for HD real-time decoding.展开更多
Given the substantially increasing complexity of embedded systems, the use of relatively detailed clock cycle-accurate simulators for the design-space exploration is impractical in the early design stages. Raising the...Given the substantially increasing complexity of embedded systems, the use of relatively detailed clock cycle-accurate simulators for the design-space exploration is impractical in the early design stages. Raising the abstraction level is nowadays widely seen as a solution to bridge the gap between the increasing system complexity and the low design productivity. For this, several system-level design tools and methodologies have been introduced to efficiently explore the design space of heterogeneous signal processing systems. In this paper, we demonstrate the effectiveness and the flexibility of the Sesame/Artemis system-level modeling and simulation methodology for efficient peformance evaluation and rapid architectural exploration of the increasing complexity heterogeneous embedded media systems. For this purpose, we have selected a system level design of a very high complexity media application;a H.264/AVC (Advanced Video Codec) video encoder. The encoding performances will be evaluated using system-level simulations targeting multiple heterogeneous multiprocessors platforms.展开更多
In this paper, we propose novel hardware architecture for intra 16 × 16 module for the macroblock engine of a new video coding standard H.264. To reduce the cycle of intra prediction 16 × 16, transform/quant...In this paper, we propose novel hardware architecture for intra 16 × 16 module for the macroblock engine of a new video coding standard H.264. To reduce the cycle of intra prediction 16 × 16, transform/quantization, and inverse quantization/inverse transform of H.264, an advanced method for different operation is proposed. This architecture can process one macroblock in 208 cycles for all cases of macroblock type by processing 4 × 4 Hadamard transform and quantization during 16 × 16 prediction. This module was designed using VHDL Hardware Description Language (HDL) and works with a 160 MHz frequency using ALTERA NIOS-II development board with Stratix II EP2S60F1020C3 FPGA. The system also includes software running on an NIOS-II processor in order to implementing the pre-processing and the post-processing functions. Finally, the execution time of our HW solution is decreased by 26% when compared with the previous work.展开更多
With the increasing demand for flexible and efficient implementation of image and video processing algorithms, there should be a good tradeoff between hardware and software design method. This paper utilized the HW-SW...With the increasing demand for flexible and efficient implementation of image and video processing algorithms, there should be a good tradeoff between hardware and software design method. This paper utilized the HW-SW codesign method to implement the H.264 decoder in an SoC with an ARM core, a multimedia processor and a deblocking filter coprocessor. For the parallel processing features of the multimedia processor, clock cycles of decoding process can be dramatically reduced. And the hardware dedicated deblocking filter coprocessor can improve the efficiency a lot. With maximum clock frequency of 150 MHz, the whole system can achieve real time processing speed and flexibility.展开更多
Image sequences processing and video encoding are extremely time consuming problems. The time complexity of them depends on image contents. This paper presents an estimation of a block motion method for video coding w...Image sequences processing and video encoding are extremely time consuming problems. The time complexity of them depends on image contents. This paper presents an estimation of a block motion method for video coding with edge alignment. This method uses blocks of size 4 × 4 and its basic idea is to find motion vector using the edge position in each video coding block. The method finds the motion vectors more accurately and faster than any known classical method that calculates all the possibilities. Our presented algorithm is compared with known classical algorithms using the evaluation function of the peak signal-to-noise ratio. For comparison of the methods we are using parameters such as time, CPU usage, and size of compressed data. The comparison is made on benchmark data in color format YUV. Results of our proposed method are comparable and in some cases better than results of standard classical algorithms.展开更多
基金sponsored by ETRI System Semiconductor Industry Promotion Center,Human Resource Development Project for SoC Convergence.
文摘This paper proposes an efficient H.264/AVC entropy decoder.It requires no ROM/RAM fabrication process that decreases fabrication cost and increases operation speed.It was achieved by optimizing lookup tables and internal buffers,which significantly improves area,speed,and power.The proposed entropy decoder does not exploit embedded processor for bitstream manipulation, which also improves area,speed,and power.Its gate counts and maximum operation frequency are 77515 gates and 175MHz in 0.18um fabrication process,respectively.The proposed entropy decoder needs 2303 cycles in average for one macroblock decoding.It can run at 28MHz to meet the real-time processing requirement for CIF format video decoding on mobile applications.
基金the National Nature Science Foundation of China(No.90104013) 863 Project(No.2002AA119010, 2001AA121061 and 2002AA123041)
文摘This letter proposes a rate control algorithm for H.264 video encoder, which is based on block activity and buffer state. Experimental results indicate that it has an excellent performance by providing much accurate bit rate and better coding efficiency compared with H.264. The computational complexity of the algorithm is reduced by adopting a novel block activity description method using the Sum of Absolute Difference (SAD) of 16× 16 mode, and its robustness is enhanced by introducing a feedback circuit at frame layer.
基金Supported by the National Natural Science Foundation of China(No.61076021)the National Basic Research Program of China(No.2009CB320903)China Postdoctoral Science Foundation(No.2012M511364)
文摘An adaptive pipelining scheme for H.264/AVC context-based adaptive binary arithmetic coding(CABAC) decoder for high definition(HD) applications is proposed to solve data hazard problems coming from the data dependencies in CABAC decoding process.An efficiency model of CABAC decoding pipeline is derived according to the analysis of a common pipeline.Based on that,several adaptive strategies are provided.The pipelining scheme with these strategies can be adaptive to different types of syntax elements(SEs) and the pipeline will not stall during decoding process when these strategies are adopted.In addition,the decoder proposed can fully support H.264/AVC high4:2:2 profile and the experimental results show that the efficiency of decoder is much higher than other architectures with one engine.Taking both performance and cost into consideration,our design makes a good tradeoff compared with other work and it is sufficient for HD real-time decoding.
文摘Given the substantially increasing complexity of embedded systems, the use of relatively detailed clock cycle-accurate simulators for the design-space exploration is impractical in the early design stages. Raising the abstraction level is nowadays widely seen as a solution to bridge the gap between the increasing system complexity and the low design productivity. For this, several system-level design tools and methodologies have been introduced to efficiently explore the design space of heterogeneous signal processing systems. In this paper, we demonstrate the effectiveness and the flexibility of the Sesame/Artemis system-level modeling and simulation methodology for efficient peformance evaluation and rapid architectural exploration of the increasing complexity heterogeneous embedded media systems. For this purpose, we have selected a system level design of a very high complexity media application;a H.264/AVC (Advanced Video Codec) video encoder. The encoding performances will be evaluated using system-level simulations targeting multiple heterogeneous multiprocessors platforms.
文摘In this paper, we propose novel hardware architecture for intra 16 × 16 module for the macroblock engine of a new video coding standard H.264. To reduce the cycle of intra prediction 16 × 16, transform/quantization, and inverse quantization/inverse transform of H.264, an advanced method for different operation is proposed. This architecture can process one macroblock in 208 cycles for all cases of macroblock type by processing 4 × 4 Hadamard transform and quantization during 16 × 16 prediction. This module was designed using VHDL Hardware Description Language (HDL) and works with a 160 MHz frequency using ALTERA NIOS-II development board with Stratix II EP2S60F1020C3 FPGA. The system also includes software running on an NIOS-II processor in order to implementing the pre-processing and the post-processing functions. Finally, the execution time of our HW solution is decreased by 26% when compared with the previous work.
文摘With the increasing demand for flexible and efficient implementation of image and video processing algorithms, there should be a good tradeoff between hardware and software design method. This paper utilized the HW-SW codesign method to implement the H.264 decoder in an SoC with an ARM core, a multimedia processor and a deblocking filter coprocessor. For the parallel processing features of the multimedia processor, clock cycles of decoding process can be dramatically reduced. And the hardware dedicated deblocking filter coprocessor can improve the efficiency a lot. With maximum clock frequency of 150 MHz, the whole system can achieve real time processing speed and flexibility.
文摘Image sequences processing and video encoding are extremely time consuming problems. The time complexity of them depends on image contents. This paper presents an estimation of a block motion method for video coding with edge alignment. This method uses blocks of size 4 × 4 and its basic idea is to find motion vector using the edge position in each video coding block. The method finds the motion vectors more accurately and faster than any known classical method that calculates all the possibilities. Our presented algorithm is compared with known classical algorithms using the evaluation function of the peak signal-to-noise ratio. For comparison of the methods we are using parameters such as time, CPU usage, and size of compressed data. The comparison is made on benchmark data in color format YUV. Results of our proposed method are comparable and in some cases better than results of standard classical algorithms.