To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel ske...To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-pm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.展开更多
In this paper, an efficient Cyclic Prefix (CP) reconstruction scheme is proposed for Single-Carrier systems with Frequency-Domain Equalization (SC-FDE) that employ insufficient length of CP at the transmitter. By ...In this paper, an efficient Cyclic Prefix (CP) reconstruction scheme is proposed for Single-Carrier systems with Frequency-Domain Equalization (SC-FDE) that employ insufficient length of CP at the transmitter. By utilizing a decision feedback filter to cancel the residual InterSymbol Interference (ISI) in the equalized signal, the proposed scheme can effectively lower the low bound of performance for the CP reconstruction schemes and can greatly improve the Bit Error P^te (BER) performance of SC-FDE systems. In addition, the existing methods and the proposed scheme are also optimized. It is shown in the simulation results that, when the Signal-to-Noise Ratio (SNR) exceeds a certain threshold, the proposed scheme can achieve the low bound of performance for the existing methods. Moreover, by increasing the number of iteration or through optimization, the low bound can be outperformed.展开更多
基金Project (No. 2005AA1Z1271) supported by the Hi-Tech Research and Development Program (863) of China
文摘To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding, a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-free access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical directions, which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4, 8 and 16 memory modules. Under a 0.18-pm CMOS technology, the synthesis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latencies are 3 and 2 clock cycles, respectively. We implement the proposed parallel memory architecture on a video signal processor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28× speedups for H.264 real-time decoding.
文摘In this paper, an efficient Cyclic Prefix (CP) reconstruction scheme is proposed for Single-Carrier systems with Frequency-Domain Equalization (SC-FDE) that employ insufficient length of CP at the transmitter. By utilizing a decision feedback filter to cancel the residual InterSymbol Interference (ISI) in the equalized signal, the proposed scheme can effectively lower the low bound of performance for the CP reconstruction schemes and can greatly improve the Bit Error P^te (BER) performance of SC-FDE systems. In addition, the existing methods and the proposed scheme are also optimized. It is shown in the simulation results that, when the Signal-to-Noise Ratio (SNR) exceeds a certain threshold, the proposed scheme can achieve the low bound of performance for the existing methods. Moreover, by increasing the number of iteration or through optimization, the low bound can be outperformed.