To achieve high parallel computation of discrete wavelet transform (DWT) in JPEG2000, a high-throughput two-dimensional (2D) 9/7 DWT very large scale integration (VLSI) design is proposed, in which the row proce...To achieve high parallel computation of discrete wavelet transform (DWT) in JPEG2000, a high-throughput two-dimensional (2D) 9/7 DWT very large scale integration (VLSI) design is proposed, in which the row processor is based on flipping structure. Due to the difference of the input data flow, the column processor is obtained by adding the input selector and data buffer to the row processor. Normalization steps in row and column DWT are combined to reduce the number of multipliers, and the rationality is verified. By rearranging the output of four-line row DWT with a multiplexer (MUX), the amount of data processed by each column processor becomes half, and the four-input/four- output architecture is implemented. For an image with the size of N x N, the computing time of one-level 2D 9/7 DWT is 0.25N2 + 1.5N clock cycles. The critical path delay is one multiplier delay, and only 5N internal memory is required. The results of post-route simulation on FPGA show that clock frequency reaches 136 MHz, and the throughput is 544 Msample/s, which satisfies the requirements of high-speed applications.展开更多
基金The National Science and Technology M ajor Project of the M inistry of Science and Technology of China(No.2014ZX03003007-009)
文摘To achieve high parallel computation of discrete wavelet transform (DWT) in JPEG2000, a high-throughput two-dimensional (2D) 9/7 DWT very large scale integration (VLSI) design is proposed, in which the row processor is based on flipping structure. Due to the difference of the input data flow, the column processor is obtained by adding the input selector and data buffer to the row processor. Normalization steps in row and column DWT are combined to reduce the number of multipliers, and the rationality is verified. By rearranging the output of four-line row DWT with a multiplexer (MUX), the amount of data processed by each column processor becomes half, and the four-input/four- output architecture is implemented. For an image with the size of N x N, the computing time of one-level 2D 9/7 DWT is 0.25N2 + 1.5N clock cycles. The critical path delay is one multiplier delay, and only 5N internal memory is required. The results of post-route simulation on FPGA show that clock frequency reaches 136 MHz, and the throughput is 544 Msample/s, which satisfies the requirements of high-speed applications.