摘要
为提高网格作业运行的成功率,研究了提高作业调度的可靠性的方法。研究表明,现有容错的网格作业调度算法多通过作业复制来降低节点的软硬件故障造成的作业失败的概率,它们既没有考虑作业的多个副本因共处的网络环境故障造成的同时失败,也没有考虑作业的多个副本由于所在节点缺乏相同的资源而同时失败。针对这一问题,提出了节点相似度的概念和计算方法,并将其应用到容错的网格调度算法中。提出的容错的调度算法将作业的多个副本分配到具有不同相似度的节点上运行,充分利用网格环境的分布性和异构性进一步减小作业失败的概率。
The paper investigates the grid task scheduling with the aim of decreasing the failure of grid tasks and points out that task replication is the common mechanism of most existing fault-tolerant grid scheduling algorithms. Those algorithms ignore that most replicas of the same task will fail if their network environments crash or the assigned grid nodes lack the same necessary resources. To mitigate this problem, the concept of node similarity is proposed and it is applied to a faulttolerant grid task scheduling algorithm. The proposed algorithm tries to assign the replicas of the same task to grid nodes which have less similarity and makes full use of the distributed and heterogeneous nature of grids to further decrease the failure of grid tasks.
出处
《高技术通讯》
EI
CAS
CSCD
北大核心
2008年第12期1224-1230,共7页
Chinese High Technology Letters
基金
973计划(G2005CB321806)资助项目
关键词
网格
作业调度
容错
节点相似度
grid, task scheduling, fault-tolerant, node similarity