摘要
This paper focuses on solving a problem of improving system robustness and the efficiency of a distributed system at the same time. Fault tolerance with active replication and load balancing techniques are used. The pros and cons of both techniques are analyzed, and a novel load balancing framework for fault tolerant systems with active replication is presented. Hierarchical architecture is described in detail. The framework can dynamically adjust fault tolerant groups and their memberships with respect to system loads. Three potential task scheduler group selection methods are proposed and simulation tests are made. Further analysis of test data is done and helpful observations for system design are also pointed out, including effects of task arrival intensity and task set size, relationship between total task execution time and single task execution time.
研究解决了在分布式系统中同时提高系统可靠性和运行效率的问题.针对基于主动复制的容错技术和负载平衡技术,分析了这2种技术的优势和劣处,提出了一种基于主动复制容错的负载平衡框架,讨论了该框架的层次结构.该框架能够根据系统负载,动态地调整系统中容错组的个数以及容错组中成员的个数.提出了3种选择任务调度组的方法,并进行了仿真测试.通过对仿真测试数据的分析,对任务到达强度、任务集大小以及单个任务执行时间与任务集执行时间的关系进行了讨论,这些分析结论将有助于分布式系统的设计.
基金
TheNationalNaturalScienceFoundationofChina(No.60273038)
theScientificResearchFoundationfortheReturnedOverseasChineseScholars,StateEducationMinistry
ProgramforNewCenturyExcellentTalentsinUniversity,MOE(No.NCEF040478).