A multi dimensional concatenation scheme for block codes is introduced, in which information symbols are interleaved and re encoded for more than once. It provides a convenient platform to design high performance co...A multi dimensional concatenation scheme for block codes is introduced, in which information symbols are interleaved and re encoded for more than once. It provides a convenient platform to design high performance codes with flexible interleaver size. Coset based MAP soft in/soft out decoding algorithms are presented for the F24 code. Simulation results show that the proposed coding scheme can achieve high coding gain with flexible interleaver length and very low decoding complexity.展开更多
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co...The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.展开更多
The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered de...The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered decoding) is a fixed number. In this paper, we study the circular-shifting network for decoding LDPC codes with arbitrary Z factor, especially for decoding large Z (Z P) codes, where P is the decoder parallelism. By buffering the P-length slices from the memory, and assembling the shifted slices in a fixed routine, the P-parallelism shift network can process Z-parallelism circular-shifting tasks. The implementation results show that the proposed network for arbitrary sized data shifting consumes only one times of additional resource cost compared to the traditional solution for only maximum P sized data shifting, and achieves significant saving on area and routing complexity.展开更多
In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and ...In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and disadvantages of them are listed. A modified overlapped partly parallel decoding that not only inherits the advantages of the two algorithms, but also overcomes the shortcomings of the two algorithms is proposed. The simulation results show that the three kinds of decoding have the same decoding performance; modified overlapped partly parallel decoding improves the iterative convergence rate and the throughput of system.展开更多
Genetic algorithms offer very good performances for solving large optimization problems, especially in the domain of error-correcting codes. However, they have a major drawback related to the time complexity and memor...Genetic algorithms offer very good performances for solving large optimization problems, especially in the domain of error-correcting codes. However, they have a major drawback related to the time complexity and memory occupation when running on a uniprocessor computer. This paper proposes a parallel decoder for linear block codes, using parallel genetic algorithms (PGA). The good performance and time complexity are confirmed by theoretical study and by simulations on BCH(63,30,14) codes over both AWGN and flat Rayleigh fading channels. The simulation results show that the coding gain between parallel and single genetic algorithm is about 0.7 dB at BER = 10﹣5 with only 4 processors.展开更多
A high speed and low power Viterbi decoder architecture design based on deep pipelined, clock gating and toggle filtering has been presented in this paper. The Add-Compare-Select (ACS) and Trace Back (TB) units and it...A high speed and low power Viterbi decoder architecture design based on deep pipelined, clock gating and toggle filtering has been presented in this paper. The Add-Compare-Select (ACS) and Trace Back (TB) units and its sub circuits of the decoder have been operated in deep pipelined manner to achieve high transmission rate. The Power dissipation analysis is also investigated and compared with the existing results. The techniques that have been employed in our low-power design are clock-gating and toggle filtering. The synthesized circuits are placed and routed in the standard cell design environment and implemented on a Xilinx XC2VP2fg256-6 FPGA device. Power estimation obtained through gate level simulations indicated that the proposed design reduces the power dissipation of an original Viterbi decoder design by 68.82% and a speed of 145 MHz is achieved.展开更多
A memory and driving clock efficient design scheme to achieve WCDMA high-speed channel decoder on a single XILINX’ XVC1000E FPGA chip is presented. Using a modified MAP algorithm, say parallel Sliding Window logarith...A memory and driving clock efficient design scheme to achieve WCDMA high-speed channel decoder on a single XILINX’ XVC1000E FPGA chip is presented. Using a modified MAP algorithm, say parallel Sliding Window logarithmic Maximum A Posterior (PSW-log-MAP), the on-chip turbo decoder can decode an information bit by only an average of two clocks per iteration. On the other hand, a high-parallel pipeline Viterbi algorithm is adopted to realize the 256-state convolutional code decoding. The final decoder with an 8×chip-clock (30 72MHz) driving can concurrently process a data rate up to 2 5Mbps of turbo coded sequences and a data rate over 400kbps of convolutional codes. There is no extern memory needed. Test results show that the decoding performance is only 0 2~0 3dB or less lost comparing to float simulation.展开更多
In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC...In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.展开更多
In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural o...In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural order in storage, our scheme requires 25 more memory blocks but allows a simpler configuration for variable sizes of code lengths that can be implemented on-chip. Experiment shows that for a moderate to high decoding throughput (40-100 Mbps), the hardware cost is still affordable for 3GPP's (3rd generation partnership project) interleaver.展开更多
This paper investigates the interference cancellation (IC) scheme for uplink cognitive radio systems, using the spectrum underlay strategy where the primary users (PUs) and the secondary users (SUs) coexist and ...This paper investigates the interference cancellation (IC) scheme for uplink cognitive radio systems, using the spectrum underlay strategy where the primary users (PUs) and the secondary users (SUs) coexist and operate in the same spectrum. Joint MMSE-based parallel interference cancellation (PIC) and Turbo decoding scheme is proposed to reduce the interference to the PUs, as well as to the SUs, in which the minimum mean square estimation (MMSE) filter is only employed in the first iteration, regarded as the "weakest link" of the whole detection process, to improve the quality of the preliminary detections results before they are fed to the Turbo decoder. Simulation results show that the proposed scheme can efficiently eliminate the interference to the PUs, as well as to the SUs.展开更多
The time delay of Turbo codes due to its iterative decoding is the main bottleneck of its application in real-time channel. However, the time delay can be greatly shortened through the adoption of parallel decod-ing a...The time delay of Turbo codes due to its iterative decoding is the main bottleneck of its application in real-time channel. However, the time delay can be greatly shortened through the adoption of parallel decod-ing algorithm, dividing the received bits into several sub-blocks and processing in parallel. This letter mainly discusses the applicability of turbo codes in high-speed real-time channel through the study of a parallel turbo decoding algorithm based on 3GPP-proposed turbo encoder and interleaver in various channel. Simulation re-sult shows that, by choosing an appropriate sub-block length, the time delay can be obviously shortened with-out degrading the performance and increasing hardware complexity, and furthermore indicates the applicability of Turbo codes in high-speed real-time channel.展开更多
文摘A multi dimensional concatenation scheme for block codes is introduced, in which information symbols are interleaved and re encoded for more than once. It provides a convenient platform to design high performance codes with flexible interleaver size. Coset based MAP soft in/soft out decoding algorithms are presented for the F24 code. Simulation results show that the proposed coding scheme can achieve high coding gain with flexible interleaver length and very low decoding complexity.
文摘The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.
文摘The conventional methodology for designing QC-LDPC decoders is applied for fixed configurations used in wireless communication standards, and the supported largest expansion factor Z (the parallelism of the layered decoding) is a fixed number. In this paper, we study the circular-shifting network for decoding LDPC codes with arbitrary Z factor, especially for decoding large Z (Z P) codes, where P is the decoder parallelism. By buffering the P-length slices from the memory, and assembling the shifted slices in a fixed routine, the P-parallelism shift network can process Z-parallelism circular-shifting tasks. The implementation results show that the proposed network for arbitrary sized data shifting consumes only one times of additional resource cost compared to the traditional solution for only maximum P sized data shifting, and achieves significant saving on area and routing complexity.
基金Sponsored by the National Natural Science Foundation of China( Grant No. 61032003)the Fundamental Research Funds for the Central Universities( Grant No. HIT. NSRIF.2012021)
文摘In this paper, according to the AR4JA codes in deep space communication, two kinds of iterative decoding including partly parallel decoding and overlapped partly parallel decoding are analyzed, and the advantages and disadvantages of them are listed. A modified overlapped partly parallel decoding that not only inherits the advantages of the two algorithms, but also overcomes the shortcomings of the two algorithms is proposed. The simulation results show that the three kinds of decoding have the same decoding performance; modified overlapped partly parallel decoding improves the iterative convergence rate and the throughput of system.
文摘Genetic algorithms offer very good performances for solving large optimization problems, especially in the domain of error-correcting codes. However, they have a major drawback related to the time complexity and memory occupation when running on a uniprocessor computer. This paper proposes a parallel decoder for linear block codes, using parallel genetic algorithms (PGA). The good performance and time complexity are confirmed by theoretical study and by simulations on BCH(63,30,14) codes over both AWGN and flat Rayleigh fading channels. The simulation results show that the coding gain between parallel and single genetic algorithm is about 0.7 dB at BER = 10﹣5 with only 4 processors.
文摘A high speed and low power Viterbi decoder architecture design based on deep pipelined, clock gating and toggle filtering has been presented in this paper. The Add-Compare-Select (ACS) and Trace Back (TB) units and its sub circuits of the decoder have been operated in deep pipelined manner to achieve high transmission rate. The Power dissipation analysis is also investigated and compared with the existing results. The techniques that have been employed in our low-power design are clock-gating and toggle filtering. The synthesized circuits are placed and routed in the standard cell design environment and implemented on a Xilinx XC2VP2fg256-6 FPGA device. Power estimation obtained through gate level simulations indicated that the proposed design reduces the power dissipation of an original Viterbi decoder design by 68.82% and a speed of 145 MHz is achieved.
文摘A memory and driving clock efficient design scheme to achieve WCDMA high-speed channel decoder on a single XILINX’ XVC1000E FPGA chip is presented. Using a modified MAP algorithm, say parallel Sliding Window logarithmic Maximum A Posterior (PSW-log-MAP), the on-chip turbo decoder can decode an information bit by only an average of two clocks per iteration. On the other hand, a high-parallel pipeline Viterbi algorithm is adopted to realize the 256-state convolutional code decoding. The final decoder with an 8×chip-clock (30 72MHz) driving can concurrently process a data rate up to 2 5Mbps of turbo coded sequences and a data rate over 400kbps of convolutional codes. There is no extern memory needed. Test results show that the decoding performance is only 0 2~0 3dB or less lost comparing to float simulation.
文摘In this paper,it has proposed a realtime implementation of low-density paritycheck(LDPC) decoder with less complexity used for satellite communication on FPGA platform.By adopting a(2048.4096)irregular quasi-cyclic(QC) LDPC code,the proposed partly parallel decoding structure balances the complexity between the check node unit(CNU) and the variable node unit(VNU) based on min-sum(MS) algorithm,thereby achieving less Slice resources and superior clock performance.Moreover,as a lookup table(LUT) is utilized in this paper to search the node message stored in timeshare memory unit,it is simple to reuse and save large amount of storage resources.The implementation results on Xilinx FPGA chip illustrate that,compared with conventional structure,the proposed scheme can achieve at last 28.6%and 8%cost reduction in RAM and Slice respectively.The clock frequency is also increased to 280 MHz without decoding performance deterioration and convergence speed reduction.
基金supported by the National High-Technology Research and Development Program of China (Grant No.2003AA123310), and the National Natural Science Foundation of China (Grant Nos.60332030, 60572157)
文摘In this paper we discuss a novel storage scheme for simultaneous memory access in parallel turbo decoder. The new scheme employs vertex coloring in graph theory. Compared to a similar method that also uses unnatural order in storage, our scheme requires 25 more memory blocks but allows a simpler configuration for variable sizes of code lengths that can be implemented on-chip. Experiment shows that for a moderate to high decoding throughput (40-100 Mbps), the hardware cost is still affordable for 3GPP's (3rd generation partnership project) interleaver.
基金Project supported by the National Natural Science Foundation of China (Grant No.60972055)the Development Foundation of the Education Commission of Shanghai Municipality (Grant No.09CG40)+1 种基金the Shanghai Pujiang Program (Grant No.08PJ14057)the Science and Technology Commission of Shanghai Municipality (Grant No.10220710300)
文摘This paper investigates the interference cancellation (IC) scheme for uplink cognitive radio systems, using the spectrum underlay strategy where the primary users (PUs) and the secondary users (SUs) coexist and operate in the same spectrum. Joint MMSE-based parallel interference cancellation (PIC) and Turbo decoding scheme is proposed to reduce the interference to the PUs, as well as to the SUs, in which the minimum mean square estimation (MMSE) filter is only employed in the first iteration, regarded as the "weakest link" of the whole detection process, to improve the quality of the preliminary detections results before they are fed to the Turbo decoder. Simulation results show that the proposed scheme can efficiently eliminate the interference to the PUs, as well as to the SUs.
文摘The time delay of Turbo codes due to its iterative decoding is the main bottleneck of its application in real-time channel. However, the time delay can be greatly shortened through the adoption of parallel decod-ing algorithm, dividing the received bits into several sub-blocks and processing in parallel. This letter mainly discusses the applicability of turbo codes in high-speed real-time channel through the study of a parallel turbo decoding algorithm based on 3GPP-proposed turbo encoder and interleaver in various channel. Simulation re-sult shows that, by choosing an appropriate sub-block length, the time delay can be obviously shortened with-out degrading the performance and increasing hardware complexity, and furthermore indicates the applicability of Turbo codes in high-speed real-time channel.