期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Fault Tolerance for Lifeline-Based Global Load Balancing
1
作者 claudia fohry Marco Bungart Paul Plock 《Journal of Software Engineering and Applications》 2017年第13期925-958,共34页
Fault tolerance has become an important issue in parallel computing. It is often addressed at system level, but application-level approaches receive increasing attention. We consider a parallel programming pattern, th... Fault tolerance has become an important issue in parallel computing. It is often addressed at system level, but application-level approaches receive increasing attention. We consider a parallel programming pattern, the task pool, and provide a fault-tolerant implementation in a library. Specifically, our work refers to lifeline-based global load balancing, which is an advanced task pool variant that is implemented in the GLB framework of the parallel programming language X10. The variant considers side effect-free tasks whose results are combined into a final result by reduction. Our algorithm is able to recover from multiple fail-stop failures. If recovery is not possible, it halts with an error message. In the algorithm, each worker regularly saves its local task pool contents in the main memory of a backup partner. Backups are updated for steals. After failures, the backup partner takes over saved copies and collects others. In case of multiple failures, invocations of the restore protocol are nested. We have implemented the algorithm by extending the source code of the GLB library. In performance measurements on up to 256 places, we observed an overhead between 0.5% and 30%. The particular value depends on the application’s steal rate and task pool size. Sources of performance overhead have been further analyzed with a logging component. 展开更多
关键词 FAULT TOLERANCE Task POOL Load Balancing GLB Resilient X10
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部