摘要
在信号处理领域的优异表现使得Vector-SIMD结构在近年来获得了广泛的关注.Vector-SIMD结构和多核技术相结合是目前高性能DSP体系结构发展的重要方向.然而,在目前的多核VectorSIMD处理器中某些部件间的协同工作能力还比较弱,导致了系统的整体性能得不到有效发挥.本文设计实现了一款协同多核DSP YHFT-QMBase,从4个方面增强了多核Vector-SIMD体系结构的协同性:(1)采用动态耦合机制重定义了标量单元和向量单元的工作方式;(2)采用矩阵方式的通信机制增强了向量Lane间的交互能力;(3)采用非对齐向量存储访问机制解决了向量存储器的数据共享问题;(4)采用Qlink-Crossbar机制满足了多核间后台高效粗粒度数据搬移的需求.评估结果显示,本文提出的协同增强机制能够使传统的Vector-SIMD结构获得58.5%的性能提升.目前YHFT-QMBase已经成功流片,评测结果显示其峰值浮点乘加能力(单精度)达到32 GFMACS,定点运算能力(16位)为128 GMACS,典型功耗为8.65 W.
Vector-SIMD architecture has attracted considerable interest owing to its high performance in signal processing applications. It is an important trend to combine Vector-SIMD and multi-core technology in the architecture design of high-performance DSPs. However, the performance of current Vector-SIMD architectures is still restricted by the inefficiency of coordinated exploitation among hardware units. This paper proposes a multi-core DSP, YHFT-QMBase, which improves the correlation of traditional multi-core Vector-SIMD architectures from four aspects.(1) The cooperation between scalar and SIMD units is redefined by a dynamic coupling execution scheme.(2) The communication among SIMD lanes is enhanced by a matrix-style communication;.(3)Data sharing among vector memory banks is accomplished by an unaligned vector memory accessing scheme.(4)The background coarse-grain data transfer among cores is supported by a Qlink-Crossbar scheme. Experimental results exhibit that YHFT-QMBase can achieve an average performance gain of 58.5%, compared to traditional Vector-SIMD architectures. At peak performance, YHFT-QMBase can achieve 32 GFMACS for single-precision float-point multiply-accumulation, and 128 GMACS for fixed-point(16 bits) multiply-accumulation. The typical power consumption for YHFT-QMBase is 8.65 W.
出处
《中国科学:信息科学》
CSCD
北大核心
2015年第4期560-573,共14页
Scientia Sinica(Informationis)
基金
国家科技重大专项"核高基"(批准号:2009ZX01034-001-001-006)资助