摘要
系统故障恢复时间是众核系统容错的一项重要指标.为加快系统故障恢复,在基于消息传递模型的众核系统中提出一种快速的拓扑重配置容错方法.首先根据物理拓扑故障情况为每个核心定义映射区域,利用匈牙利算法快速构建初始解;然后通过限制交错映射的发生,采用禁忌搜索在初始解的基础上快速优化,获得最终重配置映射解;最后根据重配置映射解更新各运算节点上的节点映射关系表完成拓扑重配置,实现众核系统的核级容错.实验结果表明,该方法能够快速找到优化的拓扑重配置方案并成功地完成系统恢复,具有较低的容错时间开销.
System fault‐recovery time is a key objective for fault tolerance in manycore systems .To accelerate system recovery from faults ,a fast topology reconfiguration strategy is proposed for fault tolerance in message passing model based manycore systems .Firstly ,a mapping domain is defined for each core according to the fault condition of the physical topology and Hungarian algorithm is adopted for fast generation of the initial solution .Secondly ,by restricting twisted mappings ,Tabu search is employed to perform a fast optimization based on the initial solution and obtain the final reconfiguration mapping solution .Finally ,by updating the mapping table on each computational node according to the reconfiguration mapping solution and completing the topology reconfiguration , the core‐level fault tolerance of a manycore system is realized .The experimental results show that ,the proposed strategy is capable of finding an optimal topology reconfiguration solution rapidly and recovering the system successfully w hile maintaining low time overhead for fault tolerance .
出处
《计算机辅助设计与图形学学报》
EI
CSCD
北大核心
2014年第11期2079-2090,共12页
Journal of Computer-Aided Design & Computer Graphics
基金
2013年哈尔滨市应用技术研究与开发项目(2013RFQXJ095)
关键词
众核
容错
拓扑重配置
消息传递接口
manycore
fault tolerance
topology reconfiguration
message passing interface