期刊文献+

协同多核DSP YHFT-QMBase:体系结构及实现 被引量:7

Coordinate multi-core DSP YHFT-QMBase:architecture and implementation
原文传递
导出
摘要 在信号处理领域的优异表现使得Vector-SIMD结构在近年来获得了广泛的关注.Vector-SIMD结构和多核技术相结合是目前高性能DSP体系结构发展的重要方向.然而,在目前的多核VectorSIMD处理器中某些部件间的协同工作能力还比较弱,导致了系统的整体性能得不到有效发挥.本文设计实现了一款协同多核DSP YHFT-QMBase,从4个方面增强了多核Vector-SIMD体系结构的协同性:(1)采用动态耦合机制重定义了标量单元和向量单元的工作方式;(2)采用矩阵方式的通信机制增强了向量Lane间的交互能力;(3)采用非对齐向量存储访问机制解决了向量存储器的数据共享问题;(4)采用Qlink-Crossbar机制满足了多核间后台高效粗粒度数据搬移的需求.评估结果显示,本文提出的协同增强机制能够使传统的Vector-SIMD结构获得58.5%的性能提升.目前YHFT-QMBase已经成功流片,评测结果显示其峰值浮点乘加能力(单精度)达到32 GFMACS,定点运算能力(16位)为128 GMACS,典型功耗为8.65 W. Vector-SIMD architecture has attracted considerable interest owing to its high performance in signal processing applications. It is an important trend to combine Vector-SIMD and multi-core technology in the architecture design of high-performance DSPs. However, the performance of current Vector-SIMD architectures is still restricted by the inefficiency of coordinated exploitation among hardware units. This paper proposes a multi-core DSP, YHFT-QMBase, which improves the correlation of traditional multi-core Vector-SIMD architectures from four aspects.(1) The cooperation between scalar and SIMD units is redefined by a dynamic coupling execution scheme.(2) The communication among SIMD lanes is enhanced by a matrix-style communication;.(3)Data sharing among vector memory banks is accomplished by an unaligned vector memory accessing scheme.(4)The background coarse-grain data transfer among cores is supported by a Qlink-Crossbar scheme. Experimental results exhibit that YHFT-QMBase can achieve an average performance gain of 58.5%, compared to traditional Vector-SIMD architectures. At peak performance, YHFT-QMBase can achieve 32 GFMACS for single-precision float-point multiply-accumulation, and 128 GMACS for fixed-point(16 bits) multiply-accumulation. The typical power consumption for YHFT-QMBase is 8.65 W.
出处 《中国科学:信息科学》 CSCD 北大核心 2015年第4期560-573,共14页 Scientia Sinica(Informationis)
基金 国家科技重大专项"核高基"(批准号:2009ZX01034-001-001-006)资助
关键词 Vector-SIMD 多核 DSP 协同 评测 Vector-SIMD multi-core DSP coordination evaluation
  • 相关文献

参考文献1

二级参考文献22

  • 1.[EB/OL].http://www.flipcode.com/articles/article_dx8shoders.shtml.,.
  • 2ATI technologies Inc. Radeon whitepape, 2000.
  • 3Khailany B, Daily W Jet al. Imagine: Media processing with streams. IEEE Micro., 2001, (3/4): 35-46.
  • 4Kapasi U J, Dally W Jet al. The Imagine stream processor. In Proe. 2002 International Conference on Computer Design, 2002, Freiburg, Germany, pp.282-288.
  • 5Kapasi U Jet al. Programmable stream processor. IEEE Computer, Aug. 2003, pp.54-62.
  • 6Kapasi U J. Conditional techniques for stream processing kernels [Dissertation]. Dept. Electrical Engineering, Stanford University, 2004.
  • 7Taylor M Bet al. Evaluation of the raw microprocessor:An exposed-wire-delay architecture for ILP and streams. In ISCA2004, Munchen, Germany.
  • 8Taylor M B et al. The raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro., 2002, (3/4).
  • 9Karthikeyan Sankaralingam et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In 30th Annual Int. Syrup. Computer Architecture, May 2003.
  • 10Mai K et al. Smart memories: A modular reconfigurable architecture. In 2000 ISCA, Varfcouver, Canada, pp.161-171.

共引文献2

同被引文献26

引证文献7

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部