摘要
由于采用高维胖树拓扑结构的高性能计算机系统中叶交换机故障将严重影响系统使用,为了提高系统的可用性和可维性,基于误路由的思想提出了一套适用于高维胖树拓扑的确定性路由容错策略。其基本思路是通过误路由绕过发生故障的叶交换机,跳转至同维中其他叶交换机后,再通过正常路由到达目的节点。该容错策略可在不影响系统使用的情况下,实现故障叶交换机的屏蔽,并在实际的高维胖树系统中进行了容错实验。实验结果表明,该容错策略取得了可快速屏蔽故障叶交换机的预期效果,可以有效地提高系统维护的效率。
The leaf switch failure would seriously affect the use of high performance computer system with K-Ary N-Bridge topology. In order to improve the usability and maintainability of that topology, a routing fault-tolerant strategy based on misrouting algorithm was proposed. The basic idea was to bypass the failed leaf switch leveraging misrouting, jump to other leaf switches in the same dimension, and then reached the destination node through the normal route. The proposed faulttolerant strategy could shield the failed leaf switch without affecting the system usage. A fault-tolerant experiment was carried out in a practical K-Ary N-Bridge topology. The result shows that this fault-tolerant strategy can quickly shield the failed leaf switch as expected and can effectively improve the efficiency of system maintenance.
作者
徐佳庆
万文
蔡东京
唐付桥
何杰
张磊
XU Jiaqing;WAN Wen;CAI Dongjing;TANG Fuqiao;HE Jie;ZHANG Lei(School of Computer,National University of Defense Technology,Changsha Hunan 410073,China;National Supercomputer Center in Guangzhou,Sun Yat-sen University,Guangzhou Guangdong 510006,China)
出处
《计算机应用》
CSCD
北大核心
2018年第5期1393-1398,共6页
journal of Computer Applications
基金
国家重点研发计划项目(2016YFB0200203)
国家自然科学基金面上项目(61572509)~~
关键词
高维胖树拓扑
互连故障
路由容错策略
高性能计算
网络维护
K-Ary N-Bridge
interconnection fault
routing fault-tolerance strategy
high performance computing
network maintenance