This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural o...In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural order in storage, our scheme requires 25 more memory blocks but allows a simpler configuration for variable sizes of code lengths that can be implemented on-chip. Experiment shows that for a moderate to high decoding throughput (40-100 Mbps), the hardware cost is still affordable for 3GPP's (3rd generation partnership project) interleaver.展开更多
In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC...In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.展开更多
In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and ...In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and disadvantages of them are listed. A modified overlapped partly parallel decoding that not only inherits the advantages of the two algorithms, but also overcomes the shortcomings of the two algorithms is proposed. The simulation results show that the three kinds of decoding have the same decoding performance; modified overlapped partly parallel decoding improves the iterative convergence rate and the throughput of system.展开更多
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co...The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.展开更多
The time delay of Turbo codes due to its iterative decoding is the main bottleneck of its application in real-time channel. However, the time delay can be greatly shortened through the adoption of parallel decod-ing a...The time delay of Turbo codes due to its iterative decoding is the main bottleneck of its application in real-time channel. However, the time delay can be greatly shortened through the adoption of parallel decod-ing algorithm, dividing the received bits into several sub-blocks and processing in parallel. This letter mainly discusses the applicability of turbo codes in high-speed real-time channel through the study of a parallel turbo decoding algorithm based on 3GPP-proposed turbo encoder and interleaver in various channel. Simulation re-sult shows that, by choosing an appropriate sub-block length, the time delay can be obviously shortened with-out degrading the performance and increasing hardware complexity, and furthermore indicates the applicability of Turbo codes in high-speed real-time channel.展开更多
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金supported by the National High-Technology Research and Development Program of China (Grant No.2003AA123310), and the National Natural Science Foundation of China (Grant Nos.60332030, 60572157)
文摘In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural order in storage, our scheme requires 25 more memory blocks but allows a simpler configuration for variable sizes of code lengths that can be implemented on-chip. Experiment shows that for a moderate to high decoding throughput (40-100 Mbps), the hardware cost is still affordable for 3GPP's (3rd generation partnership project) interleaver.
文摘In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.
基金Sponsored by the National Natural Science Foundation of China( Grant No. 61032003)the Fundamental Research Funds for the Central Universities( Grant No. HIT. NSRIF.2012021)
文摘In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and disadvantages of them are listed. A modified overlapped partly parallel decoding that not only inherits the advantages of the two algorithms, but also overcomes the shortcomings of the two algorithms is proposed. The simulation results show that the three kinds of decoding have the same decoding performance; modified overlapped partly parallel decoding improves the iterative convergence rate and the throughput of system.
文摘The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.
文摘The time delay of Turbo codes due to its iterative decoding is the main bottleneck of its application in real-time channel. However, the time delay can be greatly shortened through the adoption of parallel decod-ing algorithm, dividing the received bits into several sub-blocks and processing in parallel. This letter mainly discusses the applicability of turbo codes in high-speed real-time channel through the study of a parallel turbo decoding algorithm based on 3GPP-proposed turbo encoder and interleaver in various channel. Simulation re-sult shows that, by choosing an appropriate sub-block length, the time delay can be obviously shortened with-out degrading the performance and increasing hardware complexity, and furthermore indicates the applicability of Turbo codes in high-speed real-time channel.