期刊文献+

一种高效的面向基2 FFT算法的SIMD并行存储结构 被引量:4

An Efficient SIMD Parallel Memory Structure for Radix-2 FFT Computation
下载PDF
导出
摘要 随着SIMD(Single Instruction Multiple Data stream)结构DSP(Digital Signal Processor)片上集成了越来越多的处理单元,并行访存的灵活性及带宽效率对实际运算性能的影响越来越大.本文详细分析了一般SIMD结构DSP中基2 FFT(Fast Fourier Transform)并行算法面临的访存问题,采用简单的部分地址异或逻辑完成SIMD并行访存地址转换,实现了FFT运算的无冲突SIMD并行访存;提出了几种带特殊混洗模式的向量访存指令,可完全消除SIMD结构下基2FFT运算时需要的额外混洗指令操作.最后将其应用于某16路SIMD数字信号处理器YHFT-Matrix2中向量存储器VM的优化设计.测试结果表明,采用该SIMD并行存储结构优化的VM以增加18%的硬件开销实现了FFT运算全流水无冲突并行访存和100%并行访存带宽利用率;相比优化前的设计,不同点数FFT运算可获得1.32~2.66的加速比. As more and more execution units are integrated in the digital signal processor( DSP) with single instruction multiple data stream( SIMD) extension,the flexibility and bandwidth efficiency of parallel memory access have significant effects on its whole practical performance. Based on detailed analysis of the memory access problems for radix-2 fast Fourier transform( FFT) algorithm in general SIMD DSP,this paper used parts of the address bit XOR logic to realize memory access address translation,and achieved conflict-free parallel SIMD memory accesses for FFT computation. Then several memory access instructions with special shuffle modes were brought forward,which could completely eliminate extra shuffling instruction operations of radix-2 FFT algorithm in the SIMD architecture. Finally,the vector memory( VM) in 16-way SIMD DSP YHFT-Matrix2 was optimized by above methods. The test results showthat the optimized VMcan realize fully pipelined conflict-free memory accesses and100% parallel memory access bandwidth utilization with increase of 18% area overheads. Compared with the design before optimization,the performance of different points radix-2 FFT can achieve speedup ranging from 1. 32 to 2. 66.
出处 《电子学报》 EI CAS CSCD 北大核心 2016年第2期241-246,共6页 Acta Electronica Sinica
基金 国家自然科学基金(No.61472432)
关键词 快速傅里叶变换 单指令多数据流 低位交叉 并行存储 访问冲突 数据混洗 FFT SIMD low-order interleave parallel memory access conflict data shuffle
  • 相关文献

参考文献17

  • 1Intel Corporation.Intel 64 and IA-32 Architectures Software Developer Manuals,Volume1:Basic Architecture[EB/OL].http://www.intel.com/products/processor/manuals,2015-01-05.
  • 2ARM Corporation.The Architecture for the Digital World[EB/OL].http://www.arm.com/zh/products/processors/technologies/dsp-simd.php,2015-01-05.
  • 3Woh M,Seo S,Mahlke S,et al.AnySP:anytime anywhere anyway signal processing.Proceedings of the 36th Annual International Symposium on Computer Architecture[C].Austin,Texas,USA:ACM,2009.128-139.
  • 4Deepu Talla,Lizy Kurian John,Doug Burger.Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements[J].IEEE Transactions on Computers,2003,52(8):1015-1030.
  • 5James W Cooley,John W Tukey.An algorithm for the machine calculation of complex Fourier series[J].Mathematics of Computation,1965,19(90):297-301.
  • 6Yun-Nan Chang,Keshal K Parhi.An efficient pipelined FFT architecture[J].IEEE Transactions on Circuits and Systems-II:Analog and Digital Signal Processing,2003,50(6):322-325.
  • 7B M Baas.A low-power,high-performance,1024-point FFT processor[J].IEEE Journal of Solid-State Circuits,1999,34(3):380-387.
  • 8Stephen Richardson,Ofer Shacham,et al.An area-efficient minimum-time FFT schedule using single-ported memory.Proceedings of 2013 IFIP/IEEE 21st International Conference on VLSI-SoC[C].Istanbul,Turkey:IEEE,2013.39-44.
  • 9Ji-yang Yu,Yang Li.An efficient conflict-free parallel memory access scheme for dual-butterfly constant geometry radix-2 FFT processor.ICSP2008 Proceedings[C].Beijing,China:IEEE,2008.458-461.
  • 10Chen-Fong Hsiao,Yuan Chen,Chen-Yi Lee.A Generalized mixed-radix algorithm for memory-based FFT processors[J].IEEE Transactions on Circuits and Systems-II:Express Briefs,2010,57( 1):26-30.

二级参考文献18

  • 1Woh M, Lin Y, Seo S, et al. Analyzing the Scalability of SIMD for the Next Generation Software Defined Radio [ C]// Proceedings of the IEEE International Conference on Acoustics, Speech and Signal 2008:5388 -5391.
  • 2Moon T K, Stirling W C. Mathematical Methods and Algorithms for Signal Processing [J]. Upper Saddle River, NJ, U. S. A. : Prentice Hall, Inc. , 2000.
  • 3Cooperman M, Andrade P. CMOS Gigabit-per-second Switching [ C]//Proceedings of the IEEE Journal of Solid- state Circuits, 1993, 28(6) : 631 -639.
  • 4Van Berkel K, Heinle F, Meuwissen P P E et al. Vector Processing as an Enabler for Software Defined Radio in Handheld Devices [ J ]. EURASIP Journal on Applied Signal Processing, 2005 (16) :2613 -2625.
  • 5Limberg T, Winter M, Bimberg M, et al. A Heterogeneous MPSoC with Hardware Supported Dynamic Task Scheduling for Software Defined Radio [ C ]//Proceedings of the Design Automation conference, 2009.
  • 6Woh M, Lin Y, SCo Set al. From SODA to Scotch: The Evolution of a Wireless Baseband Processor [ C]//Proceedings of 41st IEEE International Symposium on Micrearchiteeture, Lake Comn, Italy, 2008 : 8 - 12.
  • 7Woh M, Seo S, Mahlkes, et al. AnySP: Anytime Anywhere Anyway Signal Processing [ C ]//Proceedings of ISCA, 2009 : 128 - 139..
  • 8Freescale Semiconductor. Alfivec Engine Benchmarks [ EB/OL]. [2011 -09 -20]. http://www, freescale, com/webapp/sps/sitJ overview, jsp? nodeld =0162468rH3bTdGmKqWSNt2, 2006.
  • 9Lee R. Subword Parallelism with MAX-2 [ J]. IEEE Micro, 1996,16(4) :51 -59.
  • 10ARM. Cortex-A8 Technical Reference Manual [ EB/OL]. [ 2011 - 09 - 20 ]. http ://infocenter. ann. corn 2008, ARM DDI 0344H.

同被引文献22

引证文献4

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部