摘要
随着SIMD(Single Instruction Multiple Data stream)结构DSP(Digital Signal Processor)片上集成了越来越多的处理单元,并行访存的灵活性及带宽效率对实际运算性能的影响越来越大.本文详细分析了一般SIMD结构DSP中基2 FFT(Fast Fourier Transform)并行算法面临的访存问题,采用简单的部分地址异或逻辑完成SIMD并行访存地址转换,实现了FFT运算的无冲突SIMD并行访存;提出了几种带特殊混洗模式的向量访存指令,可完全消除SIMD结构下基2FFT运算时需要的额外混洗指令操作.最后将其应用于某16路SIMD数字信号处理器YHFT-Matrix2中向量存储器VM的优化设计.测试结果表明,采用该SIMD并行存储结构优化的VM以增加18%的硬件开销实现了FFT运算全流水无冲突并行访存和100%并行访存带宽利用率;相比优化前的设计,不同点数FFT运算可获得1.32~2.66的加速比.
As more and more execution units are integrated in the digital signal processor( DSP) with single instruction multiple data stream( SIMD) extension,the flexibility and bandwidth efficiency of parallel memory access have significant effects on its whole practical performance. Based on detailed analysis of the memory access problems for radix-2 fast Fourier transform( FFT) algorithm in general SIMD DSP,this paper used parts of the address bit XOR logic to realize memory access address translation,and achieved conflict-free parallel SIMD memory accesses for FFT computation. Then several memory access instructions with special shuffle modes were brought forward,which could completely eliminate extra shuffling instruction operations of radix-2 FFT algorithm in the SIMD architecture. Finally,the vector memory( VM) in 16-way SIMD DSP YHFT-Matrix2 was optimized by above methods. The test results showthat the optimized VMcan realize fully pipelined conflict-free memory accesses and100% parallel memory access bandwidth utilization with increase of 18% area overheads. Compared with the design before optimization,the performance of different points radix-2 FFT can achieve speedup ranging from 1. 32 to 2. 66.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2016年第2期241-246,共6页
Acta Electronica Sinica
基金
国家自然科学基金(No.61472432)