摘要
在具有容错功能的高性能计算环境中 ,由于加入检查点机制会给系统引入额外负载 ,检查点间隔的适当选定能使系统性能优化 .Vaidya的贡献是用他的模型得出的优化的检查点间隔的求解等式独立于检查点潜伏时间 (L )及检查点恢复时间 (R) ,本文介绍了一种新的基于时间分段的模型 NSBM,引入了系统平均利用率这一容错领域更易理解的概念代替 Vaidya模型中的平均负载率并推导出了也是独立于 L及 R的求解等式 .实验结果表明 NSBM的求解模型比
Many applications (sequential or parallel) require large amount of time to complete. Such applications can encounter loss of a significant amount of computation if a failure occurs during the execution. Checkpointing and rollback is a technique used to minimize the loss of computation in an environment subject to failures. Unfortunately because of the employment of checkpoint scheme, an additional checkpoint overhead can be introduced to the system. Too big or too small checkpoint interval maybe degrades the performance of system. Proper determination of checkpoint interval can make system performance optimized. The difficulty is how to determine the checkpoint interval, at which condition the performance of checkpoint scheme is optimal. The optimized checkpoint interval's computational equation that was presented in Vaidya's model is independent of the time of checkpoint latency and checkpoint recovery that the application program spends when it rollbacks after an error occurs, which is his great contribution. This paper introduces a new segment based model, presents mean availability that is easier to be understood in fault tolerant instead of checkpoint mean overhead in Vaidya's model and derives a new equation that is also independent of the time of checkpoint latency and recovery. In the end, we give a group of computation results based on the experiment. In addition we analyze the relation of this two model. The conclusion is that the model of NSBM is more effective than the model of Vaidya in respect of the computation of checkpoint interval.
出处
《小型微型计算机系统》
CSCD
北大核心
2003年第3期448-451,共4页
Journal of Chinese Computer Systems
基金
国家高性能计算基金 (993 13 )的资助