This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Syste...This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.展开更多
In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete mem...In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete memoryless channels(BDMCs),the proposed decoders quantize the virtual subchannels of polar codes to maximize mutual information(MMI)between source bits and quantized symbols.The nested structure of polar codes ensures that the MMI quantization can be implemented stage by stage.Simulation results show that the proposed MMI decoders with 4 quantization bits outperform the existing nonuniform quantized decoders that minimize mean-squared error(MMSE)with 4 quantization bits,and yield even better performance than uniform MMI quantized decoders with 5 quantization bits.Furthermore,the proposed 5-bit quantized MMI decoders approach the floating-point decoders with negligible performance loss.展开更多
At present,convolutional neural networks(CNNs)and transformers surpass humans in many situations(such as face recognition and object classification),but do not work well in identifying fibers in textile surface images...At present,convolutional neural networks(CNNs)and transformers surpass humans in many situations(such as face recognition and object classification),but do not work well in identifying fibers in textile surface images.Hence,this paper proposes an architecture named FiberCT which takes advantages of the feature extraction capability of CNNs and the long-range modeling capability of transformer decoders to adaptively extract multiple types of fiber features.Firstly,the convolution module extracts fiber features from the input textile surface images.Secondly,these features are sent into the transformer decoder module where label embeddings are compared with the features of each type of fibers through multi-head cross-attention and the desired features are pooled adaptively.Finally,an asymmetric loss further purifies the extracted fiber representations.Experiments show that FiberCT can more effectively extract the representations of various types of fibers and improve fiber identification accuracy than state-of-the-art multi-label classification approaches.展开更多
This letter proposes a sliced-gated-convolutional neural network with belief propagation(SGCNN-BP) architecture for decoding long codes under correlated noise. The basic idea of SGCNNBP is using Neural Networks(NN) to...This letter proposes a sliced-gated-convolutional neural network with belief propagation(SGCNN-BP) architecture for decoding long codes under correlated noise. The basic idea of SGCNNBP is using Neural Networks(NN) to transform the correlated noise into white noise, setting up the optimal condition for a standard BP decoder that takes the output from the NN. A gate-controlled neuron is used to regulate information flow and an optional operation—slicing is adopted to reduce parameters and lower training complexity. Simulation results show that SGCNN-BP has much better performance(with the largest gap being 5dB improvement) than a single BP decoder and achieves a nearly 1dB improvement compared to Fully Convolutional Networks(FCN).展开更多
Increasing research has focused on semantic communication,the goal of which is to convey accurately the meaning instead of transmitting symbols from the sender to the receiver.In this paper,we design a novel encoding ...Increasing research has focused on semantic communication,the goal of which is to convey accurately the meaning instead of transmitting symbols from the sender to the receiver.In this paper,we design a novel encoding and decoding semantic communication framework,which adopts the semantic information and the contextual correlations between items to optimize the performance of a communication system over various channels.On the sender side,the average semantic loss caused by the wrong detection is defined,and a semantic source encoding strategy is developed to minimize the average semantic loss.To further improve communication reliability,a decoding strategy that utilizes the semantic and the context information to recover messages is proposed in the receiver.Extensive simulation results validate the superior performance of our strategies over state-of-the-art semantic coding and decoding policies on different communication channels.展开更多
Belief propagation list(BPL) decoding for polar codes has attracted more attention due to its inherent parallel nature. However, a large gap still exists with CRC-aided SCL(CA-SCL) decoding.In this work, an improved s...Belief propagation list(BPL) decoding for polar codes has attracted more attention due to its inherent parallel nature. However, a large gap still exists with CRC-aided SCL(CA-SCL) decoding.In this work, an improved segmented belief propagation list decoding based on bit flipping(SBPL-BF) is proposed. On the one hand, the proposed algorithm makes use of the cooperative characteristic in BPL decoding such that the codeword is decoded in different BP decoders. Based on this characteristic, the unreliable bits for flipping could be split into multiple subblocks and could be flipped in different decoders simultaneously. On the other hand, a more flexible and effective processing strategy for the priori information of the unfrozen bits that do not need to be flipped is designed to improve the decoding convergence. In addition, this is the first proposal in BPL decoding which jointly optimizes the bit flipping of the information bits and the code bits. In particular, for bit flipping of the code bits, a H-matrix aided bit-flipping algorithm is designed to enhance the accuracy in identifying erroneous code bits. The simulation results show that the proposed algorithm significantly improves the errorcorrection performance of BPL decoding for medium and long codes. It is more than 0.25 d B better than the state-of-the-art BPL decoding at a block error rate(BLER) of 10^(-5), and outperforms CA-SCL decoding in the low signal-to-noise(SNR) region for(1024, 0.5)polar codes.展开更多
The"Decoding Zhonghua"International Conference on Dialogue among Civilisations,hosted by China International Public Relations Association,China Ethnic News and Academy of Contemporary China and World Studies...The"Decoding Zhonghua"International Conference on Dialogue among Civilisations,hosted by China International Public Relations Association,China Ethnic News and Academy of Contemporary China and World Studies was held in Beijing on January 17th.With the theme"Pursing Harmonious Coexistence of Civilisations through Dialogue".展开更多
A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) mult...A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.展开更多
The first domestic total dose hardened 2μm partially depleted silicon-on-insulator (PDSOI) CMOS 3-line to 8- line decoder fabricated in SIMOX is demonstrated. The radiation performance is characterized by transisto...The first domestic total dose hardened 2μm partially depleted silicon-on-insulator (PDSOI) CMOS 3-line to 8- line decoder fabricated in SIMOX is demonstrated. The radiation performance is characterized by transistor threshold voltage shifts,circuit static leakage currents,and I-V curves as a function of total dose up to 3× 10^5rad(Si). The worst case threshold voltage shifts of the front channels are less than 20mV for nMOS transistors at 3 × 10^5rad(Si) and follow-up irradiation and less than 70mV for the pMOS transistors. Furthermore, no significant radiation induced leakage currents and functional degeneration are observed.展开更多
A modified Benes network is proposed to be used as an optimal shuffle network in worldwide interoperability for microwave access (WiMAX) low density parity check (LDPC) decoders, When the size of the input is not ...A modified Benes network is proposed to be used as an optimal shuffle network in worldwide interoperability for microwave access (WiMAX) low density parity check (LDPC) decoders, When the size of the input is not a power of two, the modified Benes network can achieve the most optimal performance. This modified Benes network is non-blocking and can perform any sorts of permutations, so it can support 19 modes specified in the WiMAX system. Furthermore, an efficient algorithm to generate the control signals for all the 2 × 2 switches in this network is derived, which can reduce the hardware complexity and overall latency of the modified Benes network. Synthesis results show that the proposed control signal generator can save 25.4% chip area and the overall network latency can be reduced by 36. 2%.展开更多
Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development...Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio (SR) systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element (SPE) of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform.展开更多
基金supported by the Fundamental Research Funds for the Central Universities(FRF-TP20-062A1)Guangdong Basic and Applied Basic Research Foundation(2021A1515110070)。
文摘This paper presents a software turbo decoder on graphics processing units(GPU).Unlike previous works,the proposed decoding architecture for turbo codes mainly focuses on the Consultative Committee for Space Data Systems(CCSDS)standard.However,the information frame lengths of the CCSDS turbo codes are not suitable for flexible sub-frame parallelism design.To mitigate this issue,we propose a padding method that inserts several bits before the information frame header.To obtain low-latency performance and high resource utilization,two-level intra-frame parallelisms and an efficient data structure are considered.The presented Max-Log-Map decoder can be adopted to decode the Long Term Evolution(LTE)turbo codes with only small modifications.The proposed CCSDS turbo decoder at 10 iterations on NVIDIA RTX3070 achieves about 150 Mbps and 50Mbps throughputs for the code rates 1/6 and 1/2,respectively.
基金financially supported in part by National Key R&D Program of China(No.2018YFB1801402)in part by Huawei Technologies Co.,Ltd.
文摘In this paper,we innovatively associate the mutual information with the frame error rate(FER)performance and propose novel quantized decoders for polar codes.Based on the optimal quantizer of binary-input discrete memoryless channels(BDMCs),the proposed decoders quantize the virtual subchannels of polar codes to maximize mutual information(MMI)between source bits and quantized symbols.The nested structure of polar codes ensures that the MMI quantization can be implemented stage by stage.Simulation results show that the proposed MMI decoders with 4 quantization bits outperform the existing nonuniform quantized decoders that minimize mean-squared error(MMSE)with 4 quantization bits,and yield even better performance than uniform MMI quantized decoders with 5 quantization bits.Furthermore,the proposed 5-bit quantized MMI decoders approach the floating-point decoders with negligible performance loss.
基金National Natural Science Foundation of China(No.61972081)Fundamental Research Funds for the Central Universities,China(No.2232023Y-01)Natural Science Foundation of Shanghai,China(No.22ZR1400200)。
文摘At present,convolutional neural networks(CNNs)and transformers surpass humans in many situations(such as face recognition and object classification),but do not work well in identifying fibers in textile surface images.Hence,this paper proposes an architecture named FiberCT which takes advantages of the feature extraction capability of CNNs and the long-range modeling capability of transformer decoders to adaptively extract multiple types of fiber features.Firstly,the convolution module extracts fiber features from the input textile surface images.Secondly,these features are sent into the transformer decoder module where label embeddings are compared with the features of each type of fibers through multi-head cross-attention and the desired features are pooled adaptively.Finally,an asymmetric loss further purifies the extracted fiber representations.Experiments show that FiberCT can more effectively extract the representations of various types of fibers and improve fiber identification accuracy than state-of-the-art multi-label classification approaches.
基金supported by Beijing Natural Science Foundation (L202003)。
文摘This letter proposes a sliced-gated-convolutional neural network with belief propagation(SGCNN-BP) architecture for decoding long codes under correlated noise. The basic idea of SGCNNBP is using Neural Networks(NN) to transform the correlated noise into white noise, setting up the optimal condition for a standard BP decoder that takes the output from the NN. A gate-controlled neuron is used to regulate information flow and an optional operation—slicing is adopted to reduce parameters and lower training complexity. Simulation results show that SGCNN-BP has much better performance(with the largest gap being 5dB improvement) than a single BP decoder and achieves a nearly 1dB improvement compared to Fully Convolutional Networks(FCN).
基金supported in part by the National Natural Science Foundation of China under Grant No.61931020,U19B2024,62171449,62001483in part by the science and technology innovation Program of Hunan Province under Grant No.2021JJ40690。
文摘Increasing research has focused on semantic communication,the goal of which is to convey accurately the meaning instead of transmitting symbols from the sender to the receiver.In this paper,we design a novel encoding and decoding semantic communication framework,which adopts the semantic information and the contextual correlations between items to optimize the performance of a communication system over various channels.On the sender side,the average semantic loss caused by the wrong detection is defined,and a semantic source encoding strategy is developed to minimize the average semantic loss.To further improve communication reliability,a decoding strategy that utilizes the semantic and the context information to recover messages is proposed in the receiver.Extensive simulation results validate the superior performance of our strategies over state-of-the-art semantic coding and decoding policies on different communication channels.
基金funded by the Key Project of NSFC-Guangdong Province Joint Program(Grant No.U2001204)the National Natural Science Foundation of China(Grant Nos.61873290 and 61972431)+1 种基金the Science and Technology Program of Guangzhou,China(Grant No.202002030470)the Funding Project of Featured Major of Guangzhou Xinhua University(2021TZ002).
文摘Belief propagation list(BPL) decoding for polar codes has attracted more attention due to its inherent parallel nature. However, a large gap still exists with CRC-aided SCL(CA-SCL) decoding.In this work, an improved segmented belief propagation list decoding based on bit flipping(SBPL-BF) is proposed. On the one hand, the proposed algorithm makes use of the cooperative characteristic in BPL decoding such that the codeword is decoded in different BP decoders. Based on this characteristic, the unreliable bits for flipping could be split into multiple subblocks and could be flipped in different decoders simultaneously. On the other hand, a more flexible and effective processing strategy for the priori information of the unfrozen bits that do not need to be flipped is designed to improve the decoding convergence. In addition, this is the first proposal in BPL decoding which jointly optimizes the bit flipping of the information bits and the code bits. In particular, for bit flipping of the code bits, a H-matrix aided bit-flipping algorithm is designed to enhance the accuracy in identifying erroneous code bits. The simulation results show that the proposed algorithm significantly improves the errorcorrection performance of BPL decoding for medium and long codes. It is more than 0.25 d B better than the state-of-the-art BPL decoding at a block error rate(BLER) of 10^(-5), and outperforms CA-SCL decoding in the low signal-to-noise(SNR) region for(1024, 0.5)polar codes.
文摘The"Decoding Zhonghua"International Conference on Dialogue among Civilisations,hosted by China International Public Relations Association,China Ethnic News and Academy of Contemporary China and World Studies was held in Beijing on January 17th.With the theme"Pursing Harmonious Coexistence of Civilisations through Dialogue".
文摘A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.
文摘The first domestic total dose hardened 2μm partially depleted silicon-on-insulator (PDSOI) CMOS 3-line to 8- line decoder fabricated in SIMOX is demonstrated. The radiation performance is characterized by transistor threshold voltage shifts,circuit static leakage currents,and I-V curves as a function of total dose up to 3× 10^5rad(Si). The worst case threshold voltage shifts of the front channels are less than 20mV for nMOS transistors at 3 × 10^5rad(Si) and follow-up irradiation and less than 70mV for the pMOS transistors. Furthermore, no significant radiation induced leakage currents and functional degeneration are observed.
基金The National Natural Science Foundation of China(No.60871079)
文摘A modified Benes network is proposed to be used as an optimal shuffle network in worldwide interoperability for microwave access (WiMAX) low density parity check (LDPC) decoders, When the size of the input is not a power of two, the modified Benes network can achieve the most optimal performance. This modified Benes network is non-blocking and can perform any sorts of permutations, so it can support 19 modes specified in the WiMAX system. Furthermore, an efficient algorithm to generate the control signals for all the 2 × 2 switches in this network is derived, which can reduce the hardware complexity and overall latency of the modified Benes network. Synthesis results show that the proposed control signal generator can save 25.4% chip area and the overall network latency can be reduced by 36. 2%.
文摘Viterbi decoding is widely used in many radio systems. Because of the large computation complexity, it is usually implemented with ASIC chips, FPGA chips, or optimized hardware accelerators. With the rapid development of the multicore technology, multicore platforms become a reasonable choice for software radio (SR) systems. The Cell Broadband Engine processor is a state-of-art multi-core processor designed by Sony, Toshiba, and IBM. In this paper, we present a 64-state soft input Viterbi decoder for WiMAX SR Baseband system based on the Cell processor. With one Synergistic Processor Element (SPE) of a Cell Processor running at 3.2GHz, our Viterbi decoder can achieve the throughput up to 30Mb/s to decode the tail-biting convolutional code. The performance demonstrates that the proposed Viterbi decoding implementation is very efficient. Moreover, the Viterbi decoder can be easily integrated to the SR system and can provide a highly integrated SR solution. The optimization methodology in this module design can be extended to other modules on Cell platform.