基于SIMD处理器的全定制多粒度矩阵寄存器文件被引量：1

A customized multi-grain matrix register file for SIMD processors

下载PDF

导出

摘要在SIMD处理器上映射矩阵运算时会带来大量的数据重排操作从而降低系统性能。本文提出定制化的多粒度矩阵寄存器文件(MMRF)以消除数据重排操作。MMRF支持多粒度的并行行访问和列访问,从而提升矩阵运算的性能。MMRF可以被动态配置为不同的并行访问模式,在不同模式下一个或多个子矩阵可以被并行处理。实验结果显示,同传统的向量寄存器文件(VRF)和矩阵寄存器文件(MRF)相比,MMRF可分别带来2.21倍和1.6倍的平均性能提升,面积分别增加14.3%和3.7%,功耗分别增加14.6%和2.2%。同TMS320C64x+处理器相比,基于SIMD技术的FT-Matrix处理器在引入MMRF后可以得到5.65倍到7.71倍的性能提升。通过层次化的全定制设计技术,MMRF的面积和关键路径分别减少17.9%和39.1%。 Mapping matrix operations on SIMD processors brings a large amount of data rearrangement that lowers the system performance. In this study, a customized Multi-Grain Matrix Register File （MMRF）, which supports muhi-gTained parallel row -wise and column-wise access, was proposed to eliminate these data rearrangement and increase the performance of matrix operations. The MMRF could be configured into different parallel access modes, in which one or severel sub-matrices can be accessed in parallel. Experimental results show that, compared with the traditional Vector Register File （VRF） and the MRF, the MMRF can respectively achieve about 2. 21x and 1. 6x average performance improvement, where the area of MMRF increases by 14.3% and 3.7% respectively, and the power of MMRF increases by 14.6% and 2.2% respectively. Compared with TMS320C64x ＋, the SIMD processor of FT-Matrix can achieve about 5.65x to 7.71x performance improvement by employing the MMRF. By hierarchical customized design technalogy, the area and critical-path delay of MMRF can be reduced by 17.9% and 39. 1% respectively.

作者张凯陈书明王耀华陈海燕李振涛

机构地区国防科技大学计算机学院

出处《国防科技大学学报》 EI CAS CSCD 北大核心 2013年第4期156-160,共5页 Journal of National University of Defense Technology

基金国家自然科学基金资助项目(60906014 61070036) 高性能计算联合博导组科研基金项目

关键词 SIMD 矩阵运算多粒度矩阵寄存器文件 SIMD matrix operation multi-grain matrix register file

分类号 TP316 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献10

1Samsung. Downlink MIMO for EUTRA. 3GPP TSG RAN WG1 meeting #44[ R]. 3GPP RI-060335,2006.
2Andrews J, Ghesh A, Muhamed R. Fundamentals of WiMAX : understanding broadband wireless networking [ R ]. Prentice Hall, Mar, 2007.
3Woh M, Seo S, Mahlke S, et al. AnySP: Anytime anywhere anyway signal processing[ C ]. ISCA' 09, June,2009.
4Corbal J, Espasa R, Valero M. MOM: A matrix SIMD instruction set architecture for multimedia applications [ C ]// Proceedings of the ACM/IEEE SC99 Conference, 1999:1 -12.
5Shahbahrami A, Juurlink B, Vassiliadis S. Versatility of extended subwords and the matrix register file [ J ]. ACM Tramactiom on Architecua~ and Code Optimization, 2008,5(1).
6Ciobanu C, Ktuananov G, Gaydedjiev G, et al. A polyraorphie register file for matrix operations[ C]. International Conference Embedded Systems: Architectures, Modeling and Simulation, July, 2006.
7Lin Y, et al. SODA: A low-power architecture for software radio[C]//Proc. of the 33rd Annual International Symposium on Computer Architecture, 2006:89 - 101.
8Flachs B, Asano S, Dhong S H, et al. The microarehitecture of the synergistic processor for a cell processor [ J ]. IEEE Journal of Solid-State Circuits, 2006,41 ( 1 ).
9Krashinsky R, et al. The vector-thread architecture [ C ]// Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004:52 - 63.
10Texas Insmmmfes Incorporated. TMS320C64x + DSP Megamodule Reference Guide[R]. SPRU871J, 2008.

同被引文献2

1刘磊,杨子煜,沈剑良,李思昆.一种应用定制指令集可重构结构及FFT算法映射优化[J].国防科技大学学报,2012,34(6):39-45. 被引量：4
2王俊宇,王昭顺,王沁.堆栈式寄存器堆及其应用[J].计算机工程与应用,2001,37(11):42-44. 被引量：1

引证文献1

1杜孔飞,王观武,李思昆.应用定制可重构流水线多功能寄存器文件设计[J].信息工程期刊（中英文版）,2015,5(4):119-125.

1丁刚.TMS320C64x──高性能数字信号处理器[J].电子产品世界,2001,8(8):60-61.
2李军华,吴淑琴.XA270与DSP的SPI DMA通讯设计[J].电子测试,2010,21(7):81-85. 被引量：2
3李璞,孙亚辉.64位环境SIMD性能优化技术研究[J].计算机与信息技术,2007(4):81-83. 被引量：1
4杨向辉.巧用路由器VRF进行双网隔离[J].网管员世界,2009(19):45-47.
5简育华,李军辉,徐飞,雷刚.基于DDR3的数据重排设计[J].火控雷达技术,2013,42(2):45-49. 被引量：6
6钱艺,李昂,王沁,李占才.一种高速实现BP网络的SIMD处理器[J].数据采集与处理,2008,23(2):233-237.
7王柱.基于IP城域网的BGPMPLSVPN配置案例与分析[J].电脑知识与技术,2007(2):996-997. 被引量：1
8李图平,龚素文.嵌入式SIMD处理器上G.729的优化方法研究[J].计算机工程与应用,2007,43(3):139-141.
9谢瑞雯,杨波,陈国兴.TMS320C64x多通道缓冲串口的开发及应用[J].电信技术研究,2004(6):51-55.
10王柱.多角色主机在MPLS VPN中的实现[J].网络与信息,2007,21(4):72-72.

国防科技大学学报

2013年第4期

浏览历史

内容加载中请稍等...

基于SIMD处理器的全定制多粒度矩阵寄存器文件被引量：1

参考文献10

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于SIMD处理器的全定制多粒度矩阵寄存器文件 被引量：1

参考文献10

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于SIMD处理器的全定制多粒度矩阵寄存器文件被引量：1