Wide-area systems are becoming a popular infrastructure for long-running applications. Rollback- recovery, as a common technology for fault tolerance and load balance, must meet the challenges of scal- ability and inh...Wide-area systems are becoming a popular infrastructure for long-running applications. Rollback- recovery, as a common technology for fault tolerance and load balance, must meet the challenges of scal- ability and inherent variability in such applications. Most of the rollback-recovery protocols, however, are poor in scalability. Although pessimistic message logging protocols have no such problem, their fault-free overhead sometimes is prohibitive. Aiming at good scalability and acceptable overhead, this paper intro- duces the concept of pessimism grain and presents a coarse-grained pessimistic message-logging scheme. The paper also evaluates the impact of pessimism grain on the performance of the recovery scheme. Ex- perimental results show that pessimism grain is one of the key configuration parameters to reach a desired performance level. In practice, the proper pessimism grain should be selected based on the characteristics of the applications.展开更多
Due to the mobility of mobile hosts,checkpoints and message logs of the computing process may disperseover different mobile support stations in the checkpointing and rollback recovery protocol for mobilecomputing.Thre...Due to the mobility of mobile hosts,checkpoints and message logs of the computing process may disperseover different mobile support stations in the checkpointing and rollback recovery protocol for mobilecomputing.Three existing checkpoint handoff schemes do not give well consideration to the efficiency offailure-free process execution and the recovery speed of the failure process at the same time.A dynamicadaptive handoff management of the checkpointing and rollback recovery protocol for mobile computing isproposed in this paper.According to the individual feature and current state of each mobile host,differentimplementations are selected dynamically to complete the handoff process upon the handoff event.Performance analyses show that the proposed handoff management incurs a low loss of performance duringfailure-free and achieves a quick recovery upon the process fault.展开更多
When applied to mobile computing systems,checkpoint protocols for distributed computing systems would face many new challenges, such as low wireless bandwidth, frequent disconnections, and lack of stable storage at mo...When applied to mobile computing systems,checkpoint protocols for distributed computing systems would face many new challenges, such as low wireless bandwidth, frequent disconnections, and lack of stable storage at mobile hosts. This paper proposes a novel checkpoint protocol to effectively reduce the coordinating overhead. By using a communication vector, only a few processes participate in the checkpointing event. During checkpointing, the scheme can save the time used to trace the dependency tree by sending checkpoint requests to dependent processes at once. In addition, processes are non- blocking in this scheme, since the inconsistency is resolved by the piggyback technique. Hence the unnecessary and orphan messages can be avoided. Compared with the traditional coordinated checkpoint approach, the proposed non-blocking algorithm obtains a minimal number of processes to take checkpoints. It also reduces the checkpoint latency, which brings less overhead to mobile host with limited resources.展开更多
In this papert the hard problem of the thorough garbage collection in uncoordinated Checkpointing algorithms is studied. After introduction of the traditional garbage collecting scheme, with which only obsolete checkp...In this papert the hard problem of the thorough garbage collection in uncoordinated Checkpointing algorithms is studied. After introduction of the traditional garbage collecting scheme, with which only obsolete checkpoints can be discarded, it is shown that this kind of traditional method may fail to discard any checkpoint in some special cases, and it is necessary and urgent to find a thorough garbage collecting method, with which all the checkpoints useless for any future rollback-recovery including the obsolete ones can be discarded. Then, the Thorough Garbage Collection Theorem is proposed and proved, which ensures the feasibility of the thorough garbage collection, and gives the method to calculate the set of the useful checkpoints as well.展开更多
基金the National Natural Science Foundation of China (Nos. 60473031, 60673155)the Natural Science Foundation of Hunan (No. 05JJ30116)
文摘Wide-area systems are becoming a popular infrastructure for long-running applications. Rollback- recovery, as a common technology for fault tolerance and load balance, must meet the challenges of scal- ability and inherent variability in such applications. Most of the rollback-recovery protocols, however, are poor in scalability. Although pessimistic message logging protocols have no such problem, their fault-free overhead sometimes is prohibitive. Aiming at good scalability and acceptable overhead, this paper intro- duces the concept of pessimism grain and presents a coarse-grained pessimistic message-logging scheme. The paper also evaluates the impact of pessimism grain on the performance of the recovery scheme. Ex- perimental results show that pessimism grain is one of the key configuration parameters to reach a desired performance level. In practice, the proper pessimism grain should be selected based on the characteristics of the applications.
基金Supported by the National Natural Science Foundation of China (No. 60873138)Postdoctoral Scientific Research Foundation of Heilongjiang (No. LBH-008124)the Fundamental Research Funds for the Central Universities (No. HEUCFT1007)
文摘Due to the mobility of mobile hosts,checkpoints and message logs of the computing process may disperseover different mobile support stations in the checkpointing and rollback recovery protocol for mobilecomputing.Three existing checkpoint handoff schemes do not give well consideration to the efficiency offailure-free process execution and the recovery speed of the failure process at the same time.A dynamicadaptive handoff management of the checkpointing and rollback recovery protocol for mobile computing isproposed in this paper.According to the individual feature and current state of each mobile host,differentimplementations are selected dynamically to complete the handoff process upon the handoff event.Performance analyses show that the proposed handoff management incurs a low loss of performance duringfailure-free and achieves a quick recovery upon the process fault.
基金the Postdoctoral Science Foundation (No. 20060390461)the Basic Research Foundation of Harbin Engineering University (Nos. HEUF040806,HEUFT05009, and HEUFP05020)
文摘When applied to mobile computing systems,checkpoint protocols for distributed computing systems would face many new challenges, such as low wireless bandwidth, frequent disconnections, and lack of stable storage at mobile hosts. This paper proposes a novel checkpoint protocol to effectively reduce the coordinating overhead. By using a communication vector, only a few processes participate in the checkpointing event. During checkpointing, the scheme can save the time used to trace the dependency tree by sending checkpoint requests to dependent processes at once. In addition, processes are non- blocking in this scheme, since the inconsistency is resolved by the piggyback technique. Hence the unnecessary and orphan messages can be avoided. Compared with the traditional coordinated checkpoint approach, the proposed non-blocking algorithm obtains a minimal number of processes to take checkpoints. It also reduces the checkpoint latency, which brings less overhead to mobile host with limited resources.
文摘In this papert the hard problem of the thorough garbage collection in uncoordinated Checkpointing algorithms is studied. After introduction of the traditional garbage collecting scheme, with which only obsolete checkpoints can be discarded, it is shown that this kind of traditional method may fail to discard any checkpoint in some special cases, and it is necessary and urgent to find a thorough garbage collecting method, with which all the checkpoints useless for any future rollback-recovery including the obsolete ones can be discarded. Then, the Thorough Garbage Collection Theorem is proposed and proved, which ensures the feasibility of the thorough garbage collection, and gives the method to calculate the set of the useful checkpoints as well.