期刊文献+

消息传递型片上多核系统的设计 被引量:2

Design of a Message-passing Multi-core System
下载PDF
导出
摘要 基于消息传递的编程模型设计了一款片上多核系统,该系统在4$4的2Dmesh片上网络中集成了16个小型RISC处理器,这些处理器各自使用一个可配置的私有SRAM用于指令和数据的存储,而处理器间的数据包通信则通过虫孔交换的路由器及网络接口实现.此外,在软件层面实现了基本的数据传输与进程同步接口,并采用SPMD并行模式设计了3个应用案例,以对该系统进行验证和性能分析.仿真及FPGA测试结果表明,对于整数矩阵乘法、浮点FFT计算以及基于灰度图像的模板匹配问题,该多核系统的并行加速比最高可分别达到7.6,10.5和15.9. A multi-core system was designed based on the message-passing programming model.Within a 4×4 2Dmesh Network on-Chip(NoC),this system integrated 16small RISC processors,each equipped with a configurable private SRAM for instruction and data storage.Inter-processor packet communication was conducted via the wormhole switching routers and network interfaces.At the software level,some basic routines for data exchanging and process synchronization were implemented.Besides,3applications were designed by using SPMD parallel pattern for system verification and performance analysis.Simulation and FPGA test has shown that,for integer matrix multiplication,floating point FFT computation and template matching of gray images,this multi-core system can achieve a speed up to 7.6,10.5and 15.9respectively.
作者 胡哲琨 陈杰
出处 《湖南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2013年第8期102-109,共8页 Journal of Hunan University:Natural Sciences
基金 国家自然科学基金资助项目(61234003 61036004 61221004)
关键词 多核系统 片上网络 消息传递 并行计算 multi-core system network on-chip message-passing parallel computing
  • 相关文献

参考文献12

  • 1DALLY W J,TOWLES B.Route packets,not wires:on-chip interconnection networks[C] //Proceedings of Design Automation Conference.Las Vegas,USA:ACM Press,2001:684-689.
  • 2VANGAL S R,HOWARD J,RUHL G,et al.An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS[J].IEEE Journal of Solid-State Circuits,2008,43(1):29-41.
  • 3WENTZLAFF D,GRIFFIN P,HOFFMANN H,etal.Onchip interconnection architecture of the tile processor[J].IEEE Micro,2007,27(5):15-31.
  • 4HU Wei-wu,WANG Ru,CHEN Yun-ji,a al.Godson-3B:a 1GHz 40W 8-core 128GFLOPS processor in 65nm CMOS[C] //IEEE International Solid-State Circuits Conference Digest of Technical Papers.San Francisco,USA:Mira Digital Publishing,2011:76-78.
  • 5王焕东,高翔,陈云霁,胡伟武.龙芯3号互联系统的设计与实现[J].计算机研究与发展,2008,45(12):2001-2010. 被引量:22
  • 6HOWARD J,DIGHES,VANGALSR,elal.A48-coreIA32 processor in 45 nm CMOS using on-die message-passing and DVFS for pedormance and power scaling[J].IEEE Journal of Solid-State Circuits,2011,46(1):173-183.
  • 7KIM D H,ATHIKULWONGSE K,HEALY M,et al.3D-MAPS:3D massively parallel processor with stacked memory[C] // IEEE International Solid-State Circuits Conference Digest of Technical Papers.San Francisco,USA:Mira Digital Publishing,2012:188-190.
  • 8YU Zhi-yi,YOU Kai-di,XIAO Rui-jin,et al.An 800MHz 320mW 16-core processor with message-passing and sharedmemory inter-core communication mechanisms[C] //IEEE International Solid-State Circuits Conference Digest of Technical Papers.San Francisco,USA:Mira Digital Publishing,2012:64-66.
  • 9DALLY W J,TOWLES B.Principles and practices of interconnection networks[M].San Francisco:Morgan Kaufmann,2003:305-388.
  • 10YOO J,YOO S,CHOI K.Active memory processor for network-on-chip based architecture[J].IEEE Transactions on Computers,2012,61(5):622-635.

二级参考文献8

  • 1Wei-WuHu Fu-XinZhang Zu-SongLi.Microarchitecture of the Godson-2 Processor[J].Journal of Computer Science & Technology,2005,20(2):243-249. 被引量:52
  • 2Hu W, Wang J, Gao X, et al. Micro-architecture of Godson 3 multi-core processor [EB/OL]//Proc of the 20th Hot Chips. 2008 [ 2008-11-20]. http://www. hotehips. org/he20/main page. htm.
  • 3HyperTransport Technology Consortium. HyperTransport ^TMI/O I.ink Specification Revision 1. 03[M/OL]. 2001 [2008-11- 20]. http://www. hypertransprot. org/default. elm? page = HyperTransportSpecifieationslx.
  • 4ARM. AMBA AXI Protocol vl. 0 Specification [M/OL]. 2004[2007-05-10]. http://www. arm. com/products/solutions/ axi_spec. html.
  • 5Lamport L. Time, clocks, and the ordering of events in a distributed system [J]. Cornmunnieations of the ACM, 1978, 21(7): 558-565.
  • 6Gharachorloo K, Lenoski D, Laudon J, et al. Memory consistency and event ordering in scalable shared-memory multi processors [C] //Proc of the 17th Int Syrup on Computer Architecture (ISCA'90), Los Alamitos: IEEE Computer SOciety, 1990: 28-31.
  • 7Culler D, Singh J, Gupta A. Parallel Computer Architecture [M]. San Francisco: Morgan Kaufmann, 1996.
  • 8胡伟武,water.chpc.ict.ac.cn,施巍松,water.chpc.ict.ac.cn,唐志敏,water.chpc.ict.ac.cn.A Framework of Memory Consistency Models[J].Journal of Computer Science & Technology,1998,13(2):110-124. 被引量:1

共引文献21

同被引文献3

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部