This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete mem...In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete memoryless channels(BDMCs),the proposed decoders quantize the virtual subchannels of polar codes to maximize mutual information(MMI)between source bits and quantized symbols.The nested structure of polar codes ensures that the MMI quantization can be implemented stage by stage.Simulation results show that the proposed MMI decoders with 4 quantization bits outperform the existing nonuniform quantized decoders that minimize mean-squared error(MMSE)with 4 quantization bits,and yield even better performance than uniform MMI quantized decoders with 5 quantization bits.Furthermore,the proposed 5-bit quantized MMI decoders approach the floating-point decoders with negligible performance loss.展开更多
A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) mult...A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.展开更多
The first domestic total dose hardened 2μm partially depleted silicon-on-insulator (PDSOI) CMOS 3-line to 8- line decoder fabricated in SIMOX is demonstrated. The radiation performance is characterized by transisto...The first domestic total dose hardened 2μm partially depleted silicon-on-insulator (PDSOI) CMOS 3-line to 8- line decoder fabricated in SIMOX is demonstrated. The radiation performance is characterized by transistor threshold voltage shifts,circuit static leakage currents,and I-V curves as a function of total dose up to 3× 10^5rad(Si). The worst case threshold voltage shifts of the front channels are less than 20mV for nMOS transistors at 3 × 10^5rad(Si) and follow-up irradiation and less than 70mV for the pMOS transistors. Furthermore, no significant radiation induced leakage currents and functional degeneration are observed.展开更多
A modified Benes network is proposed to be used as an optimal shuffle network in worldwide interoperability for microwave access (WiMAX) low density parity check (LDPC) decoders, When the size of the input is not ...A modified Benes network is proposed to be used as an optimal shuffle network in worldwide interoperability for microwave access (WiMAX) low density parity check (LDPC) decoders, When the size of the input is not a power of two, the modified Benes network can achieve the most optimal performance. This modified Benes network is non-blocking and can perform any sorts of permutations, so it can support 19 modes specified in the WiMAX system. Furthermore, an efficient algorithm to generate the control signals for all the 2 × 2 switches in this network is derived, which can reduce the hardware complexity and overall latency of the modified Benes network. Synthesis results show that the proposed control signal generator can save 25.4% chip area and the overall network latency can be reduced by 36. 2%.展开更多
Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development...Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio (SR) systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element (SPE) of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform.展开更多
Decoding by alternating direction method of multipliers(ADMM) is a promising linear programming decoder for low-density parity-check(LDPC) codes. In this paper, we propose a two-step scheme to lower the error floor of...Decoding by alternating direction method of multipliers(ADMM) is a promising linear programming decoder for low-density parity-check(LDPC) codes. In this paper, we propose a two-step scheme to lower the error floor of LDPC codes with ADMM penalized decoder.For the undetected errors that cannot be avoided at the decoder side, we modify the code structure slightly to eliminate low-weight code words. For the detected errors induced by small error-prone structures, we propose a post-processing method for the ADMM penalized decoder. Simulation results show that the error floor can be reduced significantly over three illustrated LDPC codes by the proposed two-step scheme.展开更多
Modern satellite communication systems require on-board processing(OBP)for performance improvements,and SRAM-FPGAs are an attractive option for OBP implementation.However,SRAM-FPGAs are sensitive to radiation effects,...Modern satellite communication systems require on-board processing(OBP)for performance improvements,and SRAM-FPGAs are an attractive option for OBP implementation.However,SRAM-FPGAs are sensitive to radiation effects,among which single event upsets(SEUs)are important as they can lead to data corruption and system failure.This paper studies the fault tolerance capability of a SRAM-FPGA implemented Viterbi decoder to SEUs on the user memory.Analysis and fault injection experiments are conducted to verify that over 97%of the SEUs on user memory would not lead to output errors.To achieve a better reliability,selective protection schemes are then proposed to further improve the reliability of the decoder to SEUs on user memory with very small overhead.Although the results are obtained for a specific FPGA implementation,the developed reliability estimation model and the general conclusions still hold for other implementations.展开更多
An improved list sphere decoder (ILSD) is proposed based on the conventional list sphere decoder (LSD) and the reduced- complexity maximum likelihood sphere-decoding algorithm. Unlike the conventional LSD with fix...An improved list sphere decoder (ILSD) is proposed based on the conventional list sphere decoder (LSD) and the reduced- complexity maximum likelihood sphere-decoding algorithm. Unlike the conventional LSD with fixed initial radius, the ILSD adopts an adaptive radius to accelerate the list cdnstruction. Characterized by low-complexity and radius-insensitivity, the proposed algorithm makes iterative joint detection and decoding more realizable in multiple-antenna systems. Simulation results show that computational savings of ILSD over LSD are more apparent with more transmit antennas or larger constellations, and with no performance degradation. Because the complexity of the ILSD algorithm almost keeps invariant with the increasing of initial radius, the BER performance can be improved by selecting a sufficiently large radius.展开更多
A high performance SDRAM controller for HDTV decoder is designed. MB-based ( macro block) address mapping, adaptive-precharge and command interleaving are adopted in this controller. MB-based address mapping reduces...A high performance SDRAM controller for HDTV decoder is designed. MB-based ( macro block) address mapping, adaptive-precharge and command interleaving are adopted in this controller. MB-based address mapping reduces the precharge operations of the video processing unit in one access; adaptive- precharge avoids unnecessary precharge operations; while command interleaving inserts the precharge and activate commands of the next access into the command sequence of the current access, thus reduces the no operation (NOP) cycles. Combination of these three schemes effectively improves the SDRAM performance. Compared with precharge-all scheme, adaptive-precharge and command interleaving reduce the SDRAM overhead cycles by 70% and increases SDRAM performance by up to 19.2% in the best case. This controller has been implemented in an AVS SoC and the frequency is 200MHz.展开更多
This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design...This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design, according to the regularity of the codewords, the first one detector is used to solve the low efficiency and high power dissipation problem within the traditional method of table-searching. Considering the relevance of the data used in the process of runbefore's decoding, arithmetic operation is combined with finite state machine (FSM), which achieves higher decoding efficiency. According to the CAVLC decoding flow, clock gating is employed in the module level and the register level respectively, which reduces 43% of the overall dynamic power dissipation. The proposed design can decode every syntax element in one clock cycle. When the proposed design is synthesized at the clock constraint of 100 MHz, the synthesis result shows that the design costs 11 300 gates under a 0.25 μm CMOS technology, which meets the demand of real time decoding in the H.264/AVC standard.展开更多
Quantum error-correction codes are immeasurable resources for quantum computing and quantum communication.However,the existing decoders are generally incapable of checking node duplication of belief propagation(BP)on ...Quantum error-correction codes are immeasurable resources for quantum computing and quantum communication.However,the existing decoders are generally incapable of checking node duplication of belief propagation(BP)on quantum low-density parity check(QLDPC)codes.Based on the probability theory in the machine learning,mathematical statistics and topological structure,a GF(4)(the Galois field is abbreviated as GF)augmented model BP decoder with Tanner graph is designed.The problem of repeated check nodes can be solved by this decoder.In simulation,when the random perturbation strength p=0.0115-0.0116 and number of attempts N=60-70,the highest decoding efficiency of the augmented model BP decoder is obtained,and the low-loss frame error rate(FER)decreases to 7.1975×10^(-5).Hence,we design a novel augmented model decoder to compare the relationship between GF(2)and GF(4)for quantum code[[450,200]]on the depolarization channel.It can be verified that the proposed decoder provides the widely application range,and the decoding performance is better in QLDPC codes.展开更多
In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC...In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.展开更多
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金financially supported in part by National Key R&D Program of China(No.2018YFB1801402)in part by Huawei Technologies Co.,Ltd.
文摘In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete memoryless channels(BDMCs),the proposed decoders quantize the virtual subchannels of polar codes to maximize mutual information(MMI)between source bits and quantized symbols.The nested structure of polar codes ensures that the MMI quantization can be implemented stage by stage.Simulation results show that the proposed MMI decoders with 4 quantization bits outperform the existing nonuniform quantized decoders that minimize mean-squared error(MMSE)with 4 quantization bits,and yield even better performance than uniform MMI quantized decoders with 5 quantization bits.Furthermore,the proposed 5-bit quantized MMI decoders approach the floating-point decoders with negligible performance loss.
文摘A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.
文摘The first domestic total dose hardened 2μm partially depleted silicon-on-insulator (PDSOI) CMOS 3-line to 8- line decoder fabricated in SIMOX is demonstrated. The radiation performance is characterized by transistor threshold voltage shifts,circuit static leakage currents,and I-V curves as a function of total dose up to 3× 10^5rad(Si). The worst case threshold voltage shifts of the front channels are less than 20mV for nMOS transistors at 3 × 10^5rad(Si) and follow-up irradiation and less than 70mV for the pMOS transistors. Furthermore, no significant radiation induced leakage currents and functional degeneration are observed.
基金The National Natural Science Foundation of China(No.60871079)
文摘A modified Benes network is proposed to be used as an optimal shuffle network in worldwide interoperability for microwave access (WiMAX) low density parity check (LDPC) decoders, When the size of the input is not a power of two, the modified Benes network can achieve the most optimal performance. This modified Benes network is non-blocking and can perform any sorts of permutations, so it can support 19 modes specified in the WiMAX system. Furthermore, an efficient algorithm to generate the control signals for all the 2 × 2 switches in this network is derived, which can reduce the hardware complexity and overall latency of the modified Benes network. Synthesis results show that the proposed control signal generator can save 25.4% chip area and the overall network latency can be reduced by 36. 2%.
文摘Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio (SR) systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element (SPE) of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform.
基金supported in part by National Nature Science Foundation of China under Grant No.61471286,No.61271004the Fundamental Research Funds for the Central Universitiesthe open research fund of Key Laboratory of Information Coding and Transmission,Southwest Jiaotong University(No.2010-03)
文摘Decoding by alternating direction method of multipliers(ADMM) is a promising linear programming decoder for low-density parity-check(LDPC) codes. In this paper, we propose a two-step scheme to lower the error floor of LDPC codes with ADMM penalized decoder.For the undetected errors that cannot be avoided at the decoder side, we modify the code structure slightly to eliminate low-weight code words. For the detected errors induced by small error-prone structures, we propose a post-processing method for the ADMM penalized decoder. Simulation results show that the error floor can be reduced significantly over three illustrated LDPC codes by the proposed two-step scheme.
基金supported in part by the National Key R&D Program(Grant No.2017YFE0121300)in part by the National Natural Science Foundation of China (Grant No. 61501321)+1 种基金in part by Tianjin science and technology program (Grant No. 17ZXRGGX00160)the support of the TEXEO project TEC201680339R funded by the Spanish Ministry of Economy and Competitivity
文摘Modern satellite communication systems require on-board processing(OBP)for performance improvements,and SRAM-FPGAs are an attractive option for OBP implementation.However,SRAM-FPGAs are sensitive to radiation effects,among which single event upsets(SEUs)are important as they can lead to data corruption and system failure.This paper studies the fault tolerance capability of a SRAM-FPGA implemented Viterbi decoder to SEUs on the user memory.Analysis and fault injection experiments are conducted to verify that over 97%of the SEUs on user memory would not lead to output errors.To achieve a better reliability,selective protection schemes are then proposed to further improve the reliability of the decoder to SEUs on user memory with very small overhead.Although the results are obtained for a specific FPGA implementation,the developed reliability estimation model and the general conclusions still hold for other implementations.
基金The National Natural Science Founda-tion of China ( No 60496316)the National Hi-Tech Re-search and Development Program (863) of China (No2006-AA01Z270)
文摘An improved list sphere decoder (ILSD) is proposed based on the conventional list sphere decoder (LSD) and the reduced- complexity maximum likelihood sphere-decoding algorithm. Unlike the conventional LSD with fixed initial radius, the ILSD adopts an adaptive radius to accelerate the list cdnstruction. Characterized by low-complexity and radius-insensitivity, the proposed algorithm makes iterative joint detection and decoding more realizable in multiple-antenna systems. Simulation results show that computational savings of ILSD over LSD are more apparent with more transmit antennas or larger constellations, and with no performance degradation. Because the complexity of the ILSD algorithm almost keeps invariant with the increasing of initial radius, the BER performance can be improved by selecting a sufficiently large radius.
文摘A high performance SDRAM controller for HDTV decoder is designed. MB-based ( macro block) address mapping, adaptive-precharge and command interleaving are adopted in this controller. MB-based address mapping reduces the precharge operations of the video processing unit in one access; adaptive- precharge avoids unnecessary precharge operations; while command interleaving inserts the precharge and activate commands of the next access into the command sequence of the current access, thus reduces the no operation (NOP) cycles. Combination of these three schemes effectively improves the SDRAM performance. Compared with precharge-all scheme, adaptive-precharge and command interleaving reduce the SDRAM overhead cycles by 70% and increases SDRAM performance by up to 19.2% in the best case. This controller has been implemented in an AVS SoC and the frequency is 200MHz.
基金Project supported by the Applied Materials Shanghai Research and Development Foundation (Grant No.08700741000)the Foundation of Shanghai Municipal Education Commission (Grant No.2006AZ068)
文摘This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design, according to the regularity of the codewords, the first one detector is used to solve the low efficiency and high power dissipation problem within the traditional method of table-searching. Considering the relevance of the data used in the process of runbefore's decoding, arithmetic operation is combined with finite state machine (FSM), which achieves higher decoding efficiency. According to the CAVLC decoding flow, clock gating is employed in the module level and the register level respectively, which reduces 43% of the overall dynamic power dissipation. The proposed design can decode every syntax element in one clock cycle. When the proposed design is synthesized at the clock constraint of 100 MHz, the synthesis result shows that the design costs 11 300 gates under a 0.25 μm CMOS technology, which meets the demand of real time decoding in the H.264/AVC standard.
基金the National Natural Science Foundation of China(Grant Nos.11975132 and 61772295)the Natural Science Foundation of Shandong Province,China(Grant No.ZR2019YQ01)the Higher Education Science and Technology Program of Shandong Province,China(Grant No.J18KZ012).
文摘Quantum error-correction codes are immeasurable resources for quantum computing and quantum communication.However,the existing decoders are generally incapable of checking node duplication of belief propagation(BP)on quantum low-density parity check(QLDPC)codes.Based on the probability theory in the machine learning,mathematical statistics and topological structure,a GF(4)(the Galois field is abbreviated as GF)augmented model BP decoder with Tanner graph is designed.The problem of repeated check nodes can be solved by this decoder.In simulation,when the random perturbation strength p=0.0115-0.0116 and number of attempts N=60-70,the highest decoding efficiency of the augmented model BP decoder is obtained,and the low-loss frame error rate(FER)decreases to 7.1975×10^(-5).Hence,we design a novel augmented model decoder to compare the relationship between GF(2)and GF(4)for quantum code[[450,200]]on the depolarization channel.It can be verified that the proposed decoder provides the widely application range,and the decoding performance is better in QLDPC codes.
文摘In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.