摘要
低密度奇偶校验码(low-density parity-check,LDPC)作为一类高性能的差错控制编码被用于多个通信标准中,但解码算法计算量巨大,限制了其潜能,基于通用图形处理器(general-purpose GPU,GPGPU)的LDPC解码器由于其灵活性,近年来备受关注。深入分析了LDPC解码算法特性,提出Tanner图的交织器表示,简化了解码算法;结合GPU体系结构特点提出自顶向下的多步优化策略,充分挖掘了GPU的加速性能。实验结果显示,平衡计算访存负载、合并对齐全局访存、充分利用寄存器资源,可显著提高GPU性能;相对于CPU实现,可取得383倍的加速,综合性能优于现有的基于GPU的LDPC解码实现。
As powerful, error correcting codes, low-density parity-check (LDPC) codes have been adopted by new emerging stand- ards for digital communication; however, their performance gain is constrained due to their huge computation demand. The GPU- based LDPC decoder is a recent hot research subject for its lower cost and better flexibility. We analyze the parallelism property of SPA (sum product algorithm) and propose an easy way to translate the Tanner graph into an interleaver. From a hardware ar chitecture perspective, we propose an efficient up-to-down multi-stage optimization strategy which releases GPU's acceleration power to its limit gradually. Experimental results demonstrate that balancing computation and memory access, coalescing global memory accessing and aggressive usage o{ the o^chip high speed resource (e. g. , shared memory and registers) can promote the performance significantly. The proposed decoder can achieve 383x-speedup compared to CPU-based decoder and also outperfor mances existing GPU-based ones in terms of overall performance.
出处
《中国科技论文》
CAS
北大核心
2013年第7期626-632,共7页
China Sciencepaper
基金
高等学校博士学科点专项科研基金资助项目(20114307110001)
国家自然科学基金资助项目(60873016
61170083)
关键词
低密度奇偶校验码解码器
和积算法
通用图形处理器
优化策略
并行计算
LDPC decoder
sum-product algorithm
general purpose graphic processor unit
optimization strategy
parallel cornputing