This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete mem...In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete memoryless channels(BDMCs),the proposed decoders quantize the virtual subchannels of polar codes to maximize mutual information(MMI)between source bits and quantized symbols.The nested structure of polar codes ensures that the MMI quantization can be implemented stage by stage.Simulation results show that the proposed MMI decoders with 4 quantization bits outperform the existing nonuniform quantized decoders that minimize mean-squared error(MMSE)with 4 quantization bits,and yield even better performance than uniform MMI quantized decoders with 5 quantization bits.Furthermore,the proposed 5-bit quantized MMI decoders approach the floating-point decoders with negligible performance loss.展开更多
Belief propagation list(BPL) decoding for polar codes has attracted more attention due to its inherent parallel nature. However, a large gap still exists with CRC-aided SCL(CA-SCL) decoding.In this work, an improved s...Belief propagation list(BPL) decoding for polar codes has attracted more attention due to its inherent parallel nature. However, a large gap still exists with CRC-aided SCL(CA-SCL) decoding.In this work, an improved segmented belief propagation list decoding based on bit flipping(SBPL-BF) is proposed. On the one hand, the proposed algorithm makes use of the cooperative characteristic in BPL decoding such that the codeword is decoded in different BP decoders. Based on this characteristic, the unreliable bits for flipping could be split into multiple subblocks and could be flipped in different decoders simultaneously. On the other hand, a more flexible and effective processing strategy for the priori information of the unfrozen bits that do not need to be flipped is designed to improve the decoding convergence. In addition, this is the first proposal in BPL decoding which jointly optimizes the bit flipping of the information bits and the code bits. In particular, for bit flipping of the code bits, a H-matrix aided bit-flipping algorithm is designed to enhance the accuracy in identifying erroneous code bits. The simulation results show that the proposed algorithm significantly improves the errorcorrection performance of BPL decoding for medium and long codes. It is more than 0.25 d B better than the state-of-the-art BPL decoding at a block error rate(BLER) of 10^(-5), and outperforms CA-SCL decoding in the low signal-to-noise(SNR) region for(1024, 0.5)polar codes.展开更多
The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered de...The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered decoding) is a fixed number. In this paper, we study the circular-shifting network for decoding LDPC codes with arbitrary Z factor, especially for decoding large Z (Z P) codes, where P is the decoder parallelism. By buffering the P-length slices from the memory, and assembling the shifted slices in a fixed routine, the P-parallelism shift network can process Z-parallelism circular-shifting tasks. The implementation results show that the proposed network for arbitrary sized data shifting consumes only one times of additional resource cost compared to the traditional solution for only maximum P sized data shifting, and achieves significant saving on area and routing complexity.展开更多
A modified Benes network is proposed to be used as an optimal shuffle network in worldwide interoperability for microwave access (WiMAX) low density parity check (LDPC) decoders, When the size of the input is not ...A modified Benes network is proposed to be used as an optimal shuffle network in worldwide interoperability for microwave access (WiMAX) low density parity check (LDPC) decoders, When the size of the input is not a power of two, the modified Benes network can achieve the most optimal performance. This modified Benes network is non-blocking and can perform any sorts of permutations, so it can support 19 modes specified in the WiMAX system. Furthermore, an efficient algorithm to generate the control signals for all the 2 × 2 switches in this network is derived, which can reduce the hardware complexity and overall latency of the modified Benes network. Synthesis results show that the proposed control signal generator can save 25.4% chip area and the overall network latency can be reduced by 36. 2%.展开更多
Recently,a generalized successive cancellation list(SCL)decoder implemented with shiftedpruning(SP)scheme,namely the SCL-SP-ωdecoder,is presented for polar codes,which is able to shift the pruning window at mostωtim...Recently,a generalized successive cancellation list(SCL)decoder implemented with shiftedpruning(SP)scheme,namely the SCL-SP-ωdecoder,is presented for polar codes,which is able to shift the pruning window at mostωtimes during each SCL re-decoding attempt to prevent the correct path from being eliminated.The candidate positions for applying the SP scheme are selected by a shifting metric based on the probability that the elimination occurs.However,the number of exponential/logarithm operations involved in the SCL-SP-ωdecoder grows linearly with the number of information bits and list size,which leads to high computational complexity.In this paper,we present a detailed analysis of the SCL-SP-ωdecoder in terms of the decoding performance and complexity,which unveils that the choice of the shifting metric is essential for improving the decoding performance and reducing the re-decoding attempts simultaneously.Then,we introduce a simplified metric derived from the path metric(PM)domain,and a custom-tailored deep learning(DL)network is further designed to enhance the efficiency of the proposed simplified metric.The proposed metrics are both free of transcendental functions and hence,are more hardware-friendly than the existing metrics.Simulation results show that the proposed DL-aided metric provides the best error correction performance as comparison with the state of the art.展开更多
Belief propagation(BP)decoding outputs soft information and can be naturally used in iterative receivers.BP list(BPL)decoding provides comparable error-correction performance to the successive cancellation list(SCL)de...Belief propagation(BP)decoding outputs soft information and can be naturally used in iterative receivers.BP list(BPL)decoding provides comparable error-correction performance to the successive cancellation list(SCL)decoding.In this paper,we firstly introduce an enhanced code construction scheme for BPL decoding to improve its errorcorrection capability.Then,a GPU-based BPL decoder with adoption of the new code construction is presented.Finally,the proposed BPL decoder is tested on NVIDIA RTX3070 and GTX1060.Experimental results show that the presented BPL decoder with early termination criterion achieves above 1 Gbps throughput on RTX3070 for the code(1024,512)with 32 lists under good channel conditions.展开更多
For polar codes,the performance of successive cancellation list(SCL)decoding is capable of approaching that of maximum likelihood decoding.However,the existing hardware architectures for the SCL decoding suffer from h...For polar codes,the performance of successive cancellation list(SCL)decoding is capable of approaching that of maximum likelihood decoding.However,the existing hardware architectures for the SCL decoding suffer from high hardware complexity due to calculating L decoding paths simultaneously,which are unfriendly to the devices with limited logical resources,such as field programmable gate arrays(FPGAs).In this paper,we propose a list-serial pipelined hardware architecture with low complexity for the SCL decoding,where the serial calculation and the pipelined operation are elegantly combined to strike a balance between the complexity and the latency.Moreover,we employ only one successive cancellation(SC)decoder core without L×L crossbars,and reduce the number of inputs of the metric sorter from 2L to L+2.Finally,the FPGA implementations show that the hardware resource consumption is significantly reduced with negligible decoding performance loss.展开更多
Decoding by alternating direction method of multipliers(ADMM) is a promising linear programming decoder for low-density parity-check(LDPC) codes. In this paper, we propose a two-step scheme to lower the error floor of...Decoding by alternating direction method of multipliers(ADMM) is a promising linear programming decoder for low-density parity-check(LDPC) codes. In this paper, we propose a two-step scheme to lower the error floor of LDPC codes with ADMM penalized decoder.For the undetected errors that cannot be avoided at the decoder side, we modify the code structure slightly to eliminate low-weight code words. For the detected errors induced by small error-prone structures, we propose a post-processing method for the ADMM penalized decoder. Simulation results show that the error floor can be reduced significantly over three illustrated LDPC codes by the proposed two-step scheme.展开更多
This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design...This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design, according to the regularity of the codewords, the first one detector is used to solve the low efficiency and high power dissipation problem within the traditional method of table-searching. Considering the relevance of the data used in the process of runbefore's decoding, arithmetic operation is combined with finite state machine (FSM), which achieves higher decoding efficiency. According to the CAVLC decoding flow, clock gating is employed in the module level and the register level respectively, which reduces 43% of the overall dynamic power dissipation. The proposed design can decode every syntax element in one clock cycle. When the proposed design is synthesized at the clock constraint of 100 MHz, the synthesis result shows that the design costs 11 300 gates under a 0.25 μm CMOS technology, which meets the demand of real time decoding in the H.264/AVC standard.展开更多
In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC...In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.展开更多
Quantum error correction technology is an important solution to solve the noise interference generated during the operation of quantum computers.In order to find the best syndrome of the stabilizer code in quantum err...Quantum error correction technology is an important solution to solve the noise interference generated during the operation of quantum computers.In order to find the best syndrome of the stabilizer code in quantum error correction,we need to find a fast and close to the optimal threshold decoder.In this work,we build a convolutional neural network(CNN)decoder to correct errors in the toric code based on the system research of machine learning.We analyze and optimize various conditions that affect CNN,and use the RestNet network architecture to reduce the running time.It is shortened by 30%-40%,and we finally design an optimized algorithm for CNN decoder.In this way,the threshold accuracy of the neural network decoder is made to reach 10.8%,which is closer to the optimal threshold of about 11%.The previous threshold of 8.9%-10.3%has been slightly improved,and there is no need to verify the basic noise.展开更多
Offset Shuffle Networks(OSNs) interleave a-posterior probability messages in the Block Row-Layered Decoder(BRLD) of QuasiCyclic Low-Density Parity-Check(QC-LDPC)codes.However,OSNs usually consume a significant amount ...Offset Shuffle Networks(OSNs) interleave a-posterior probability messages in the Block Row-Layered Decoder(BRLD) of QuasiCyclic Low-Density Parity-Check(QC-LDPC)codes.However,OSNs usually consume a significant amount of computational resources and limit the clock frequency,particularly when the size of the Circulant Permutation Matrix(CPM)is large.To simplify the architecture of the OSN,we propose a Simplified Offset Shuffle Network Block Progressive Edge-Growth(SOSNBPEG) algorithm to construct a class of QCLDPC codes.The SOSN-BPEG algorithm constrains the shift values of CPMs and the difference of the shift values in the same column by progressively appending check nodes.Simulation results indicate that the error performance of the SOSN-BPEG codes is the same as that of the codes in WiMAX and DVB-S2.The SOSNBPEG codes can reduce the complexity of the OSNs by up to 54.3%,and can improve the maximum frequency by up to 21.7%for various code lengths and rates.展开更多
A new Chien search method for shortened Reed-Solomon (RS) code is proposed, based on this, a versatile RS decoder for correcting both errors and erasures is designed. Compared with the traditional RS decoder, the we...A new Chien search method for shortened Reed-Solomon (RS) code is proposed, based on this, a versatile RS decoder for correcting both errors and erasures is designed. Compared with the traditional RS decoder, the weighted coefficient of the Chien search method is calculated sequentially through the three pipelined stages of the decoder. And therefore, the computation of the errata locator polynomial and errata evaluator polynomial needs to be modified. The versatile RS decoder with minimum distance 21 has been synthesized in the Xilinx Virtex-Ⅱ series field programmable gate array (FPGA) xe2v1000-5 and is used by coneatenated coding system for satellite communication. Results show that the maximum data processing rate can be up to 1.3 Gbit/s.展开更多
In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural o...In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural order in storage, our scheme requires 25 more memory blocks but allows a simpler configuration for variable sizes of code lengths that can be implemented on-chip. Experiment shows that for a moderate to high decoding throughput (40-100 Mbps), the hardware cost is still affordable for 3GPP's (3rd generation partnership project) interleaver.展开更多
Genetic algorithms are successfully used for decoding some classes of error correcting codes, and offer very good performances for solving large optimization problems. This article proposes a new decoder based on Seri...Genetic algorithms are successfully used for decoding some classes of error correcting codes, and offer very good performances for solving large optimization problems. This article proposes a new decoder based on Serial Genetic Algorithm Decoder (SGAD) for decoding Low Density Parity Check (LDPC) codes. The results show that the proposed algorithm gives large gains over sum-product decoder, which proves its efficiency.展开更多
This paper presented a concatenated maximum-likelihood (ML) decoder for space-time/space-frequency block coded orthogonal frequency diversion multiplexing (ST/SFBC-OFDM) systems in double selective fading channels. Th...This paper presented a concatenated maximum-likelihood (ML) decoder for space-time/space-frequency block coded orthogonal frequency diversion multiplexing (ST/SFBC-OFDM) systems in double selective fading channels. The proposed decoder first detects space-time or space-frequency codeword elements separately. Then, according to the coarsely estimated codeword elements, the ML decoding is performed in a smaller constellation element set to searching final codeword. It is proved that the proposed decoder has optimal performances if and only if subchannels are constant during a codeword interval. The simulation results show that the performances of proposed decoder is close to that of the optimal ML decoder in severe Doppler and delay spread channels. However, the complexity of proposed decoder is much lower than that of the optimal ML decoder.展开更多
Genetic algorithms offer very good performances for solving large optimization problems, especially in the domain of error-correcting codes. However, they have a major drawback related to the time complexity and memor...Genetic algorithms offer very good performances for solving large optimization problems, especially in the domain of error-correcting codes. However, they have a major drawback related to the time complexity and memory occupation when running on a uniprocessor computer. This paper proposes a parallel decoder for linear block codes, using parallel genetic algorithms (PGA). The good performance and time complexity are confirmed by theoretical study and by simulations on BCH(63,30,14) codes over both AWGN and flat Rayleigh fading channels. The simulation results show that the coding gain between parallel and single genetic algorithm is about 0.7 dB at BER = 10﹣5 with only 4 processors.展开更多
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金financially supported in part by National Key R&D Program of China(No.2018YFB1801402)in part by Huawei Technologies Co.,Ltd.
文摘In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete memoryless channels(BDMCs),the proposed decoders quantize the virtual subchannels of polar codes to maximize mutual information(MMI)between source bits and quantized symbols.The nested structure of polar codes ensures that the MMI quantization can be implemented stage by stage.Simulation results show that the proposed MMI decoders with 4 quantization bits outperform the existing nonuniform quantized decoders that minimize mean-squared error(MMSE)with 4 quantization bits,and yield even better performance than uniform MMI quantized decoders with 5 quantization bits.Furthermore,the proposed 5-bit quantized MMI decoders approach the floating-point decoders with negligible performance loss.
基金funded by the Key Project of NSFC-Guangdong Province Joint Program(Grant No.U2001204)the National Natural Science Foundation of China(Grant Nos.61873290 and 61972431)+1 种基金the Science and Technology Program of Guangzhou,China(Grant No.202002030470)the Funding Project of Featured Major of Guangzhou Xinhua University(2021TZ002).
文摘Belief propagation list(BPL) decoding for polar codes has attracted more attention due to its inherent parallel nature. However, a large gap still exists with CRC-aided SCL(CA-SCL) decoding.In this work, an improved segmented belief propagation list decoding based on bit flipping(SBPL-BF) is proposed. On the one hand, the proposed algorithm makes use of the cooperative characteristic in BPL decoding such that the codeword is decoded in different BP decoders. Based on this characteristic, the unreliable bits for flipping could be split into multiple subblocks and could be flipped in different decoders simultaneously. On the other hand, a more flexible and effective processing strategy for the priori information of the unfrozen bits that do not need to be flipped is designed to improve the decoding convergence. In addition, this is the first proposal in BPL decoding which jointly optimizes the bit flipping of the information bits and the code bits. In particular, for bit flipping of the code bits, a H-matrix aided bit-flipping algorithm is designed to enhance the accuracy in identifying erroneous code bits. The simulation results show that the proposed algorithm significantly improves the errorcorrection performance of BPL decoding for medium and long codes. It is more than 0.25 d B better than the state-of-the-art BPL decoding at a block error rate(BLER) of 10^(-5), and outperforms CA-SCL decoding in the low signal-to-noise(SNR) region for(1024, 0.5)polar codes.
文摘The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered decoding) is a fixed number. In this paper, we study the circular-shifting network for decoding LDPC codes with arbitrary Z factor, especially for decoding large Z (Z P) codes, where P is the decoder parallelism. By buffering the P-length slices from the memory, and assembling the shifted slices in a fixed routine, the P-parallelism shift network can process Z-parallelism circular-shifting tasks. The implementation results show that the proposed network for arbitrary sized data shifting consumes only one times of additional resource cost compared to the traditional solution for only maximum P sized data shifting, and achieves significant saving on area and routing complexity.
基金The National Natural Science Foundation of China(No.60871079)
文摘A modified Benes network is proposed to be used as an optimal shuffle network in worldwide interoperability for microwave access (WiMAX) low density parity check (LDPC) decoders, When the size of the input is not a power of two, the modified Benes network can achieve the most optimal performance. This modified Benes network is non-blocking and can perform any sorts of permutations, so it can support 19 modes specified in the WiMAX system. Furthermore, an efficient algorithm to generate the control signals for all the 2 × 2 switches in this network is derived, which can reduce the hardware complexity and overall latency of the modified Benes network. Synthesis results show that the proposed control signal generator can save 25.4% chip area and the overall network latency can be reduced by 36. 2%.
基金supported in part by the National Key Research and Development Program of China under Grant 2018YFB1802303in part by the Zhejiang Provincial Natural Science Foundation of China under Grant LQ20F010010。
文摘Recently,a generalized successive cancellation list(SCL)decoder implemented with shiftedpruning(SP)scheme,namely the SCL-SP-ωdecoder,is presented for polar codes,which is able to shift the pruning window at mostωtimes during each SCL re-decoding attempt to prevent the correct path from being eliminated.The candidate positions for applying the SP scheme are selected by a shifting metric based on the probability that the elimination occurs.However,the number of exponential/logarithm operations involved in the SCL-SP-ωdecoder grows linearly with the number of information bits and list size,which leads to high computational complexity.In this paper,we present a detailed analysis of the SCL-SP-ωdecoder in terms of the decoding performance and complexity,which unveils that the choice of the shifting metric is essential for improving the decoding performance and reducing the re-decoding attempts simultaneously.Then,we introduce a simplified metric derived from the path metric(PM)domain,and a custom-tailored deep learning(DL)network is further designed to enhance the efficiency of the proposed simplified metric.The proposed metrics are both free of transcendental functions and hence,are more hardware-friendly than the existing metrics.Simulation results show that the proposed DL-aided metric provides the best error correction performance as comparison with the state of the art.
基金supported by the Fundamental Research Funds for the Central Universities (FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation (2021A1515110070)
文摘Belief propagation(BP)decoding outputs soft information and can be naturally used in iterative receivers.BP list(BPL)decoding provides comparable error-correction performance to the successive cancellation list(SCL)decoding.In this paper,we firstly introduce an enhanced code construction scheme for BPL decoding to improve its errorcorrection capability.Then,a GPU-based BPL decoder with adoption of the new code construction is presented.Finally,the proposed BPL decoder is tested on NVIDIA RTX3070 and GTX1060.Experimental results show that the presented BPL decoder with early termination criterion achieves above 1 Gbps throughput on RTX3070 for the code(1024,512)with 32 lists under good channel conditions.
基金supported in part by the National Key R&D Program of China(No.2019YFB1803400)。
文摘For polar codes,the performance of successive cancellation list(SCL)decoding is capable of approaching that of maximum likelihood decoding.However,the existing hardware architectures for the SCL decoding suffer from high hardware complexity due to calculating L decoding paths simultaneously,which are unfriendly to the devices with limited logical resources,such as field programmable gate arrays(FPGAs).In this paper,we propose a list-serial pipelined hardware architecture with low complexity for the SCL decoding,where the serial calculation and the pipelined operation are elegantly combined to strike a balance between the complexity and the latency.Moreover,we employ only one successive cancellation(SC)decoder core without L×L crossbars,and reduce the number of inputs of the metric sorter from 2L to L+2.Finally,the FPGA implementations show that the hardware resource consumption is significantly reduced with negligible decoding performance loss.
基金supported in part by National Nature Science Foundation of China under Grant No.61471286,No.61271004the Fundamental Research Funds for the Central Universitiesthe open research fund of Key Laboratory of Information Coding and Transmission,Southwest Jiaotong University(No.2010-03)
文摘Decoding by alternating direction method of multipliers(ADMM) is a promising linear programming decoder for low-density parity-check(LDPC) codes. In this paper, we propose a two-step scheme to lower the error floor of LDPC codes with ADMM penalized decoder.For the undetected errors that cannot be avoided at the decoder side, we modify the code structure slightly to eliminate low-weight code words. For the detected errors induced by small error-prone structures, we propose a post-processing method for the ADMM penalized decoder. Simulation results show that the error floor can be reduced significantly over three illustrated LDPC codes by the proposed two-step scheme.
基金Project supported by the Applied Materials Shanghai Research and Development Foundation (Grant No.08700741000)the Foundation of Shanghai Municipal Education Commission (Grant No.2006AZ068)
文摘This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design, according to the regularity of the codewords, the first one detector is used to solve the low efficiency and high power dissipation problem within the traditional method of table-searching. Considering the relevance of the data used in the process of runbefore's decoding, arithmetic operation is combined with finite state machine (FSM), which achieves higher decoding efficiency. According to the CAVLC decoding flow, clock gating is employed in the module level and the register level respectively, which reduces 43% of the overall dynamic power dissipation. The proposed design can decode every syntax element in one clock cycle. When the proposed design is synthesized at the clock constraint of 100 MHz, the synthesis result shows that the design costs 11 300 gates under a 0.25 μm CMOS technology, which meets the demand of real time decoding in the H.264/AVC standard.
文摘In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.
基金the National Natural Science Foundation of China(Grant Nos.11975132 and 61772295)the Natural Science Foundation of Shandong Province,China(Grant No.ZR2019YQ01)the Project of Shandong Province Higher Educational Science and Technology Program,China(Grant No.J18KZ012).
文摘Quantum error correction technology is an important solution to solve the noise interference generated during the operation of quantum computers.In order to find the best syndrome of the stabilizer code in quantum error correction,we need to find a fast and close to the optimal threshold decoder.In this work,we build a convolutional neural network(CNN)decoder to correct errors in the toric code based on the system research of machine learning.We analyze and optimize various conditions that affect CNN,and use the RestNet network architecture to reduce the running time.It is shortened by 30%-40%,and we finally design an optimized algorithm for CNN decoder.In this way,the threshold accuracy of the neural network decoder is made to reach 10.8%,which is closer to the optimal threshold of about 11%.The previous threshold of 8.9%-10.3%has been slightly improved,and there is no need to verify the basic noise.
基金supported by the National Natural Science Foundation of China under Grant No.61071083
文摘Offset Shuffle Networks(OSNs) interleave a-posterior probability messages in the Block Row-Layered Decoder(BRLD) of QuasiCyclic Low-Density Parity-Check(QC-LDPC)codes.However,OSNs usually consume a significant amount of computational resources and limit the clock frequency,particularly when the size of the Circulant Permutation Matrix(CPM)is large.To simplify the architecture of the OSN,we propose a Simplified Offset Shuffle Network Block Progressive Edge-Growth(SOSNBPEG) algorithm to construct a class of QCLDPC codes.The SOSN-BPEG algorithm constrains the shift values of CPMs and the difference of the shift values in the same column by progressively appending check nodes.Simulation results indicate that the error performance of the SOSN-BPEG codes is the same as that of the codes in WiMAX and DVB-S2.The SOSNBPEG codes can reduce the complexity of the OSNs by up to 54.3%,and can improve the maximum frequency by up to 21.7%for various code lengths and rates.
基金Sponsored by the Ministerial Level Advanced Research Foundation (20304)
文摘A new Chien search method for shortened Reed-Solomon (RS) code is proposed, based on this, a versatile RS decoder for correcting both errors and erasures is designed. Compared with the traditional RS decoder, the weighted coefficient of the Chien search method is calculated sequentially through the three pipelined stages of the decoder. And therefore, the computation of the errata locator polynomial and errata evaluator polynomial needs to be modified. The versatile RS decoder with minimum distance 21 has been synthesized in the Xilinx Virtex-Ⅱ series field programmable gate array (FPGA) xe2v1000-5 and is used by coneatenated coding system for satellite communication. Results show that the maximum data processing rate can be up to 1.3 Gbit/s.
基金supported by the National High-Technology Research and Development Program of China (Grant No.2003AA123310), and the National Natural Science Foundation of China (Grant Nos.60332030, 60572157)
文摘In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural order in storage, our scheme requires 25 more memory blocks but allows a simpler configuration for variable sizes of code lengths that can be implemented on-chip. Experiment shows that for a moderate to high decoding throughput (40-100 Mbps), the hardware cost is still affordable for 3GPP's (3rd generation partnership project) interleaver.
文摘Genetic algorithms are successfully used for decoding some classes of error correcting codes, and offer very good performances for solving large optimization problems. This article proposes a new decoder based on Serial Genetic Algorithm Decoder (SGAD) for decoding Low Density Parity Check (LDPC) codes. The results show that the proposed algorithm gives large gains over sum-product decoder, which proves its efficiency.
文摘This paper presented a concatenated maximum-likelihood (ML) decoder for space-time/space-frequency block coded orthogonal frequency diversion multiplexing (ST/SFBC-OFDM) systems in double selective fading channels. The proposed decoder first detects space-time or space-frequency codeword elements separately. Then, according to the coarsely estimated codeword elements, the ML decoding is performed in a smaller constellation element set to searching final codeword. It is proved that the proposed decoder has optimal performances if and only if subchannels are constant during a codeword interval. The simulation results show that the performances of proposed decoder is close to that of the optimal ML decoder in severe Doppler and delay spread channels. However, the complexity of proposed decoder is much lower than that of the optimal ML decoder.
文摘Genetic algorithms offer very good performances for solving large optimization problems, especially in the domain of error-correcting codes. However, they have a major drawback related to the time complexity and memory occupation when running on a uniprocessor computer. This paper proposes a parallel decoder for linear block codes, using parallel genetic algorithms (PGA). The good performance and time complexity are confirmed by theoretical study and by simulations on BCH(63,30,14) codes over both AWGN and flat Rayleigh fading channels. The simulation results show that the coding gain between parallel and single genetic algorithm is about 0.7 dB at BER = 10﹣5 with only 4 processors.