Fault-tolerance is very important in cluster computing and has beenimplemented in many famous cluster-computing systems using checkpoint/restartmechanisms. But existent check-pointing algorithms cannot restore the sta...Fault-tolerance is very important in cluster computing and has beenimplemented in many famous cluster-computing systems using checkpoint/restartmechanisms. But existent check-pointing algorithms cannot restore the states of afile system when roll-backing the running of a program, so there are many restrictionson file accesses in existent fault-tolerance systems. SCR algorithm, an algorithmbased on atomic operation and consistent schedule, which can restore the states offile systems, is presented in this paper. In the SCR algorithm, system calls on filesystems are classified into idem-potent operations and non-idem-potent operations.A non-idem-potent operation modifies a file system's states, while an idem-potentoperation does not. SCR algorithm tracks changes of the file system states. It logseach non-idem-potent operation used by user programs and the information that canrestore the operation in disks. When check-pointing roll-backing the program, SCRalgorithm will revert the file system states to the last checkpoint time. By usingSCR algorithm, users are allowed to use any file operation in their programs.展开更多
文摘Fault-tolerance is very important in cluster computing and has beenimplemented in many famous cluster-computing systems using checkpoint/restartmechanisms. But existent check-pointing algorithms cannot restore the states of afile system when roll-backing the running of a program, so there are many restrictionson file accesses in existent fault-tolerance systems. SCR algorithm, an algorithmbased on atomic operation and consistent schedule, which can restore the states offile systems, is presented in this paper. In the SCR algorithm, system calls on filesystems are classified into idem-potent operations and non-idem-potent operations.A non-idem-potent operation modifies a file system's states, while an idem-potentoperation does not. SCR algorithm tracks changes of the file system states. It logseach non-idem-potent operation used by user programs and the information that canrestore the operation in disks. When check-pointing roll-backing the program, SCRalgorithm will revert the file system states to the last checkpoint time. By usingSCR algorithm, users are allowed to use any file operation in their programs.