This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
A multi dimensional concatenation scheme for block codes is introduced, in which information symbols are interleaved and re encoded for more than once. It provides a convenient platform to design high performance co...A multi dimensional concatenation scheme for block codes is introduced, in which information symbols are interleaved and re encoded for more than once. It provides a convenient platform to design high performance codes with flexible interleaver size. Coset based MAP soft in/soft out decoding algorithms are presented for the F24 code. Simulation results show that the proposed coding scheme can achieve high coding gain with flexible interleaver length and very low decoding complexity.展开更多
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co...The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.展开更多
Low-density parity-check(LDPC)codes are widely used due to their significant errorcorrection capability and linear decoding complexity.However,it is not sufficient for LDPC codes to satisfy the ultra low bit error rat...Low-density parity-check(LDPC)codes are widely used due to their significant errorcorrection capability and linear decoding complexity.However,it is not sufficient for LDPC codes to satisfy the ultra low bit error rate(BER)requirement of next-generation ultra-high-speed communications due to the error floor phenomenon.According to the residual error characteristics of LDPC codes,we consider using the high rate Reed-Solomon(RS)codes as the outer codes to construct LDPC-RS product codes to eliminate the error floor and propose the hybrid error-erasure-correction decoding algorithm for the outer code to exploit erasure-correction capability effectively.Furthermore,the overall performance of product codes is improved using iteration between outer and inner codes.Simulation results validate that BER of the product code with the proposed hybrid algorithm is lower than that of the product code with no erasure correction.Compared with other product codes using LDPC codes,the proposed LDPC-RS product code with the same code rate has much better performance and smaller rate loss attributed to the maximum distance separable(MDS)property and significant erasure-correction capability of RS codes.展开更多
A new Chien search method for shortened Reed-Solomon (RS) code is proposed, based on this, a versatile RS decoder for correcting both errors and erasures is designed. Compared with the traditional RS decoder, the we...A new Chien search method for shortened Reed-Solomon (RS) code is proposed, based on this, a versatile RS decoder for correcting both errors and erasures is designed. Compared with the traditional RS decoder, the weighted coefficient of the Chien search method is calculated sequentially through the three pipelined stages of the decoder. And therefore, the computation of the errata locator polynomial and errata evaluator polynomial needs to be modified. The versatile RS decoder with minimum distance 21 has been synthesized in the Xilinx Virtex-Ⅱ series field programmable gate array (FPGA) xe2v1000-5 and is used by coneatenated coding system for satellite communication. Results show that the maximum data processing rate can be up to 1.3 Gbit/s.展开更多
In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural o...In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural order in storage, our scheme requires 25 more memory blocks but allows a simpler configuration for variable sizes of code lengths that can be implemented on-chip. Experiment shows that for a moderate to high decoding throughput (40-100 Mbps), the hardware cost is still affordable for 3GPP's (3rd generation partnership project) interleaver.展开更多
For communication systems with heavy burst noise, an optimal Forward Error Correction(FEC) scheme is expected to have a large burst error correction capability while simultaneously owning moderate random error correct...For communication systems with heavy burst noise, an optimal Forward Error Correction(FEC) scheme is expected to have a large burst error correction capability while simultaneously owning moderate random error correction capability. This letter presents a new FEC scheme based on multiple-symbol interleaved Reed-Solomon codes and an associated two-pass decoding algorithm. It is shown that the proposed multi-symbol interleaved Reed-Solomon scheme can achieve nearly twice as much as the burst error correction capability of conventional single-symbol interleaved Reed-Solomon codes with the same code length and code rate.展开更多
The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered de...The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered decoding) is a fixed number. In this paper, we study the circular-shifting network for decoding LDPC codes with arbitrary Z factor, especially for decoding large Z (Z P) codes, where P is the decoder parallelism. By buffering the P-length slices from the memory, and assembling the shifted slices in a fixed routine, the P-parallelism shift network can process Z-parallelism circular-shifting tasks. The implementation results show that the proposed network for arbitrary sized data shifting consumes only one times of additional resource cost compared to the traditional solution for only maximum P sized data shifting, and achieves significant saving on area and routing complexity.展开更多
In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and ...In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and disadvantages of them are listed. A modified overlapped partly parallel decoding that not only inherits the advantages of the two algorithms, but also overcomes the shortcomings of the two algorithms is proposed. The simulation results show that the three kinds of decoding have the same decoding performance; modified overlapped partly parallel decoding improves the iterative convergence rate and the throughput of system.展开更多
Genetic algorithms offer very good performances for solving large optimization problems, especially in the domain of error-correcting codes. However, they have a major drawback related to the time complexity and memor...Genetic algorithms offer very good performances for solving large optimization problems, especially in the domain of error-correcting codes. However, they have a major drawback related to the time complexity and memory occupation when running on a uniprocessor computer. This paper proposes a parallel decoder for linear block codes, using parallel genetic algorithms (PGA). The good performance and time complexity are confirmed by theoretical study and by simulations on BCH(63,30,14) codes over both AWGN and flat Rayleigh fading channels. The simulation results show that the coding gain between parallel and single genetic algorithm is about 0.7 dB at BER = 10﹣5 with only 4 processors.展开更多
In this paper,a novel dual-metric,the maximum and minimum Squared Euclidean Distance Increment (SEDI) brought by changing the hard decision symbol,is introduced to measure the reli-ability of the received M-ary Phase ...In this paper,a novel dual-metric,the maximum and minimum Squared Euclidean Distance Increment (SEDI) brought by changing the hard decision symbol,is introduced to measure the reli-ability of the received M-ary Phase Shift Keying (MPSK) symbols over a Rayleigh fading channel. Based on the dual-metric,a Chase-type soft decoding algorithm,which is called erased-Chase algorithm,is developed for Reed-Solomon (RS) coded MPSK schemes. The proposed algorithm treats the unre-liable symbols with small maximum SEDI as erasures,and tests the non-erased unreliable symbols with small minimum SEDI as the Chase-2 algorithm does. By introducing optimality test into the decoding procedure,much more reduction in the decoding complexity can be achieved. Simulation results of the RS(63,42,22)-coded 8-PSK scheme over a Rayleigh fading channel show that the proposed algorithm provides a very efficient tradeoff between the decoding complexity and the error performance. Finally,an adaptive scheme for the number of erasures is introduced into the decoding algorithm.展开更多
In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC...In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.展开更多
A high speed and low power Viterbi decoder architecture design based on deep pipelined, clock gating and toggle filtering has been presented in this paper. The Add-Compare-Select (ACS) and Trace Back (TB) units and it...A high speed and low power Viterbi decoder architecture design based on deep pipelined, clock gating and toggle filtering has been presented in this paper. The Add-Compare-Select (ACS) and Trace Back (TB) units and its sub circuits of the decoder have been operated in deep pipelined manner to achieve high transmission rate. The Power dissipation analysis is also investigated and compared with the existing results. The techniques that have been employed in our low-power design are clock-gating and toggle filtering. The synthesized circuits are placed and routed in the standard cell design environment and implemented on a Xilinx XC2VP2fg256-6 FPGA device. Power estimation obtained through gate level simulations indicated that the proposed design reduces the power dissipation of an original Viterbi decoder design by 68.82% and a speed of 145 MHz is achieved.展开更多
An error tolerant hardware efficient verylarge scale integration (VLSI) architecture for bitparallel systolic multiplication over dual base, which canbe pipelined, is presented. Since this architecture has thefeatur...An error tolerant hardware efficient verylarge scale integration (VLSI) architecture for bitparallel systolic multiplication over dual base, which canbe pipelined, is presented. Since this architecture has thefeatures of regularity, modularity and unidirectionaldata flow, this structure is well suited to VLSIimplementations. The length of the largest delay pathand area of this architecture are less compared to the bitparallel systolic multiplication architectures reportedearlier. The architecture is implemented using Austria Micro System's 0.35 μm CMOS (complementary metaloxide semiconductor) technology. This architecture canalso operate over both the dual-base and polynomialbase.展开更多
A memory and driving clock efficient design scheme to achieve WCDMA high-speed channel decoder on a single XILINX’ XVC1000E FPGA chip is presented. Using a modified MAP algorithm, say parallel Sliding Window logarith...A memory and driving clock efficient design scheme to achieve WCDMA high-speed channel decoder on a single XILINX’ XVC1000E FPGA chip is presented. Using a modified MAP algorithm, say parallel Sliding Window logarithmic Maximum A Posterior (PSW-log-MAP), the on-chip turbo decoder can decode an information bit by only an average of two clocks per iteration. On the other hand, a high-parallel pipeline Viterbi algorithm is adopted to realize the 256-state convolutional code decoding. The final decoder with an 8×chip-clock (30 72MHz) driving can concurrently process a data rate up to 2 5Mbps of turbo coded sequences and a data rate over 400kbps of convolutional codes. There is no extern memory needed. Test results show that the decoding performance is only 0 2~0 3dB or less lost comparing to float simulation.展开更多
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
文摘A multi dimensional concatenation scheme for block codes is introduced, in which information symbols are interleaved and re encoded for more than once. It provides a convenient platform to design high performance codes with flexible interleaver size. Coset based MAP soft in/soft out decoding algorithms are presented for the F24 code. Simulation results show that the proposed coding scheme can achieve high coding gain with flexible interleaver length and very low decoding complexity.
文摘The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.
基金This work was supported in part by National Natural Science Foundation of China(No.61671324)the Director’s Funding from Pilot National Laboratory for Marine Science and Technology(Qingdao)(QNLM201712).
文摘Low-density parity-check(LDPC)codes are widely used due to their significant errorcorrection capability and linear decoding complexity.However,it is not sufficient for LDPC codes to satisfy the ultra low bit error rate(BER)requirement of next-generation ultra-high-speed communications due to the error floor phenomenon.According to the residual error characteristics of LDPC codes,we consider using the high rate Reed-Solomon(RS)codes as the outer codes to construct LDPC-RS product codes to eliminate the error floor and propose the hybrid error-erasure-correction decoding algorithm for the outer code to exploit erasure-correction capability effectively.Furthermore,the overall performance of product codes is improved using iteration between outer and inner codes.Simulation results validate that BER of the product code with the proposed hybrid algorithm is lower than that of the product code with no erasure correction.Compared with other product codes using LDPC codes,the proposed LDPC-RS product code with the same code rate has much better performance and smaller rate loss attributed to the maximum distance separable(MDS)property and significant erasure-correction capability of RS codes.
基金Sponsored by the Ministerial Level Advanced Research Foundation (20304)
文摘A new Chien search method for shortened Reed-Solomon (RS) code is proposed, based on this, a versatile RS decoder for correcting both errors and erasures is designed. Compared with the traditional RS decoder, the weighted coefficient of the Chien search method is calculated sequentially through the three pipelined stages of the decoder. And therefore, the computation of the errata locator polynomial and errata evaluator polynomial needs to be modified. The versatile RS decoder with minimum distance 21 has been synthesized in the Xilinx Virtex-Ⅱ series field programmable gate array (FPGA) xe2v1000-5 and is used by coneatenated coding system for satellite communication. Results show that the maximum data processing rate can be up to 1.3 Gbit/s.
基金supported by the National High-Technology Research and Development Program of China (Grant No.2003AA123310), and the National Natural Science Foundation of China (Grant Nos.60332030, 60572157)
文摘In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural order in storage, our scheme requires 25 more memory blocks but allows a simpler configuration for variable sizes of code lengths that can be implemented on-chip. Experiment shows that for a moderate to high decoding throughput (40-100 Mbps), the hardware cost is still affordable for 3GPP's (3rd generation partnership project) interleaver.
文摘For communication systems with heavy burst noise, an optimal Forward Error Correction(FEC) scheme is expected to have a large burst error correction capability while simultaneously owning moderate random error correction capability. This letter presents a new FEC scheme based on multiple-symbol interleaved Reed-Solomon codes and an associated two-pass decoding algorithm. It is shown that the proposed multi-symbol interleaved Reed-Solomon scheme can achieve nearly twice as much as the burst error correction capability of conventional single-symbol interleaved Reed-Solomon codes with the same code length and code rate.
文摘The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered decoding) is a fixed number. In this paper, we study the circular-shifting network for decoding LDPC codes with arbitrary Z factor, especially for decoding large Z (Z P) codes, where P is the decoder parallelism. By buffering the P-length slices from the memory, and assembling the shifted slices in a fixed routine, the P-parallelism shift network can process Z-parallelism circular-shifting tasks. The implementation results show that the proposed network for arbitrary sized data shifting consumes only one times of additional resource cost compared to the traditional solution for only maximum P sized data shifting, and achieves significant saving on area and routing complexity.
基金Sponsored by the National Natural Science Foundation of China( Grant No. 61032003)the Fundamental Research Funds for the Central Universities( Grant No. HIT. NSRIF.2012021)
文摘In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and disadvantages of them are listed. A modified overlapped partly parallel decoding that not only inherits the advantages of the two algorithms, but also overcomes the shortcomings of the two algorithms is proposed. The simulation results show that the three kinds of decoding have the same decoding performance; modified overlapped partly parallel decoding improves the iterative convergence rate and the throughput of system.
文摘Genetic algorithms offer very good performances for solving large optimization problems, especially in the domain of error-correcting codes. However, they have a major drawback related to the time complexity and memory occupation when running on a uniprocessor computer. This paper proposes a parallel decoder for linear block codes, using parallel genetic algorithms (PGA). The good performance and time complexity are confirmed by theoretical study and by simulations on BCH(63,30,14) codes over both AWGN and flat Rayleigh fading channels. The simulation results show that the coding gain between parallel and single genetic algorithm is about 0.7 dB at BER = 10﹣5 with only 4 processors.
基金the National Natural Science Foundation of China (No.60272057).
文摘In this paper,a novel dual-metric,the maximum and minimum Squared Euclidean Distance Increment (SEDI) brought by changing the hard decision symbol,is introduced to measure the reli-ability of the received M-ary Phase Shift Keying (MPSK) symbols over a Rayleigh fading channel. Based on the dual-metric,a Chase-type soft decoding algorithm,which is called erased-Chase algorithm,is developed for Reed-Solomon (RS) coded MPSK schemes. The proposed algorithm treats the unre-liable symbols with small maximum SEDI as erasures,and tests the non-erased unreliable symbols with small minimum SEDI as the Chase-2 algorithm does. By introducing optimality test into the decoding procedure,much more reduction in the decoding complexity can be achieved. Simulation results of the RS(63,42,22)-coded 8-PSK scheme over a Rayleigh fading channel show that the proposed algorithm provides a very efficient tradeoff between the decoding complexity and the error performance. Finally,an adaptive scheme for the number of erasures is introduced into the decoding algorithm.
文摘In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.
文摘A high speed and low power Viterbi decoder architecture design based on deep pipelined, clock gating and toggle filtering has been presented in this paper. The Add-Compare-Select (ACS) and Trace Back (TB) units and its sub circuits of the decoder have been operated in deep pipelined manner to achieve high transmission rate. The Power dissipation analysis is also investigated and compared with the existing results. The techniques that have been employed in our low-power design are clock-gating and toggle filtering. The synthesized circuits are placed and routed in the standard cell design environment and implemented on a Xilinx XC2VP2fg256-6 FPGA device. Power estimation obtained through gate level simulations indicated that the proposed design reduces the power dissipation of an original Viterbi decoder design by 68.82% and a speed of 145 MHz is achieved.
文摘An error tolerant hardware efficient verylarge scale integration (VLSI) architecture for bitparallel systolic multiplication over dual base, which canbe pipelined, is presented. Since this architecture has thefeatures of regularity, modularity and unidirectionaldata flow, this structure is well suited to VLSIimplementations. The length of the largest delay pathand area of this architecture are less compared to the bitparallel systolic multiplication architectures reportedearlier. The architecture is implemented using Austria Micro System's 0.35 μm CMOS (complementary metaloxide semiconductor) technology. This architecture canalso operate over both the dual-base and polynomialbase.
文摘A memory and driving clock efficient design scheme to achieve WCDMA high-speed channel decoder on a single XILINX’ XVC1000E FPGA chip is presented. Using a modified MAP algorithm, say parallel Sliding Window logarithmic Maximum A Posterior (PSW-log-MAP), the on-chip turbo decoder can decode an information bit by only an average of two clocks per iteration. On the other hand, a high-parallel pipeline Viterbi algorithm is adopted to realize the 256-state convolutional code decoding. The final decoder with an 8×chip-clock (30 72MHz) driving can concurrently process a data rate up to 2 5Mbps of turbo coded sequences and a data rate over 400kbps of convolutional codes. There is no extern memory needed. Test results show that the decoding performance is only 0 2~0 3dB or less lost comparing to float simulation.