This paper focuses on solving a problem of improving system robustness and the efficiency of a distributed system at the same time. Fault tolerance with active replication and load balancing techniques are used. The p...This paper focuses on solving a problem of improving system robustness and the efficiency of a distributed system at the same time. Fault tolerance with active replication and load balancing techniques are used. The pros and cons of both techniques are analyzed, and a novel load balancing framework for fault tolerant systems with active replication is presented. Hierarchical architecture is described in detail. The framework can dynamically adjust fault tolerant groups and their memberships with respect to system loads. Three potential task scheduler group selection methods are proposed and simulation tests are made. Further analysis of test data is done and helpful observations for system design are also pointed out, including effects of task arrival intensity and task set size, relationship between total task execution time and single task execution time.展开更多
文摘This paper focuses on solving a problem of improving system robustness and the efficiency of a distributed system at the same time. Fault tolerance with active replication and load balancing techniques are used. The pros and cons of both techniques are analyzed, and a novel load balancing framework for fault tolerant systems with active replication is presented. Hierarchical architecture is described in detail. The framework can dynamically adjust fault tolerant groups and their memberships with respect to system loads. Three potential task scheduler group selection methods are proposed and simulation tests are made. Further analysis of test data is done and helpful observations for system design are also pointed out, including effects of task arrival intensity and task set size, relationship between total task execution time and single task execution time.