期刊文献+

基于消息传递模型的众核拓扑重配置容错方法

Fault-Tolerant Strategy for Topology Reconfiguration of Manycore Systems Based on Message Passing Model
下载PDF
导出
摘要 系统故障恢复时间是众核系统容错的一项重要指标.为加快系统故障恢复,在基于消息传递模型的众核系统中提出一种快速的拓扑重配置容错方法.首先根据物理拓扑故障情况为每个核心定义映射区域,利用匈牙利算法快速构建初始解;然后通过限制交错映射的发生,采用禁忌搜索在初始解的基础上快速优化,获得最终重配置映射解;最后根据重配置映射解更新各运算节点上的节点映射关系表完成拓扑重配置,实现众核系统的核级容错.实验结果表明,该方法能够快速找到优化的拓扑重配置方案并成功地完成系统恢复,具有较低的容错时间开销. System fault‐recovery time is a key objective for fault tolerance in manycore systems .To accelerate system recovery from faults ,a fast topology reconfiguration strategy is proposed for fault tolerance in message passing model based manycore systems .Firstly ,a mapping domain is defined for each core according to the fault condition of the physical topology and Hungarian algorithm is adopted for fast generation of the initial solution .Secondly ,by restricting twisted mappings ,Tabu search is employed to perform a fast optimization based on the initial solution and obtain the final reconfiguration mapping solution .Finally ,by updating the mapping table on each computational node according to the reconfiguration mapping solution and completing the topology reconfiguration , the core‐level fault tolerance of a manycore system is realized .The experimental results show that ,the proposed strategy is capable of finding an optimal topology reconfiguration solution rapidly and recovering the system successfully w hile maintaining low time overhead for fault tolerance .
出处 《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2014年第11期2079-2090,共12页 Journal of Computer-Aided Design & Computer Graphics
基金 2013年哈尔滨市应用技术研究与开发项目(2013RFQXJ095)
关键词 众核 容错 拓扑重配置 消息传递接口 manycore fault tolerance topology reconfiguration message passing interface
  • 相关文献

参考文献19

  • 1Vangal S, Howard J, Ruhl G, et al. An 80-tile 1.28 TFLOPS network on-chip in 65nm CMOS [C] //Proceedings of the IEEE International Solid-State Circuits Conference. Los Alamitos IEEE Computer Society Press, 2007:98-589 +95.
  • 2Truong D N, Cheng W H, Mohsenin T, et al. A 167- processor computational platform in 65 nm CMOS [J]. IEEE Journal of Solid-State Circuits, 2009, 44(4): 1130-1144.
  • 3TILE-Gx Processor Family Product Brief [OL]. [-2013-09- 141. http://www, tilera, eom/sites/defaultlfiles/productbriefs/ TILE-Gx 208000 20Series 2013rief_0. pdf.
  • 4Borkar S. Thousand core chips: a technology perspective [C] //Proceedings of the 44th Annual Design Automation Conference. New York: ACM press, 2007:746-749.
  • 5Lu Z H, Jantsch A. Trends of terascale computing chips in the next ten years [C] //Proceedings of the 8th IEEE International Conference on ASIC. Los Alamitos: IEEE Computer Society Press, 2009:62-66.
  • 6Saldana M, Chow P. TMD-MPI: an MPI implementation for multiple processors across multiple FPGAs [C //Proceedings of the International Conference on Field Programmable Logic and Applications. Los Alamitos: IEEE Computer Society Press, 2006: 329-334.
  • 7Joven J, Font-Bach O, Castells-Rufas D, et al. xENoC-an experimental network-on-chip environment for parallel distributed computing on NoC-based MPSoC architectures [C] //Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing. Aire-la-Ville: Eurographics Association Press, 2008: 141- 148.
  • 8Mahr P, Lorehner C, Ishebabi H, et al. Soc-MPI: A flexible message passing library for muhiprocessor systems-on chips EC //Proceedings of the International Conference on Reconfigurable Computing and FPGAs. Los Alamitos: IEEE Computer Society Press, 2008:187-192.
  • 9Fu F F, Sun S Y, Hu X A, et al. MMPI: A flexible and efficient multiprocessor message passing interface for NoC- based MPSoC [C] //Proceedings of the IEEE InternationalSoC Conference. Los Alamitos: IEEE Computer Society Press, 2010.. 359-362.
  • 10Lee J H, Yoon S R, Pyun K E, etal. A multi-processor NoC platform applied on the 802.11 i TKIP cryptosystem [C] // Proceedings of the 13th Asia and South Pacific Design Automation Conference. Los Alamitos: IEEE Computer Society Press, 2008:607-610.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部