摘要
MapReduce是一个能够对大规模数据进行分布式处理的框架,目前被各个领域广泛应用。在提供MapReduce服务的集群中,如何保证不同优先级用户的截止时间限定是MapReduce作业调度问题的一个挑战。针对这一问题,提出了一个基于排队网络的多优先级作业调度算法(MPSA)。首先分析和归纳了基于MapReduce模型的算法,提出了三种常见模式,采用Jackson排队网络对基于MapReduce模型的算法建立了数学模型,应用该网络模型可以求出不同优先级队列对资源的需求;随后使用AR(1)模型进行预测,使算法可以动态地适应不同的用户访问量;利用二分查找算法,分步计算出不同优先级在map阶段和reduce阶段分配的槽位数;最后实现了在MapReduce模型中应用的实时调度算法。实验结果表明,与传统的FIFO和公平调度算法相比,本文提出的算法在用户到达率和任务规模变化的情况下,可以更加有效地满足不同优先级用户的截止时间限定。
MapReduce is a distributed computing framework for big data processing, which has been widely used in various fields. It's a challenge to ensure the deadline of different priority users in the cluster providing MapReduce services. To solve this problem, a queuing network based multi-priority scheduling algorithm (MPSA) is proposed. Firstly, the MapReduce based algorithms are summarized and analyzed, three common patterns are proposed, and the Jackson queuing network is used to build a mathematic model of the MapReduce based algorithms. The mathematic model can be used to find the resource demands of different priority queues. Secondly, the AR(1) model is used to predict the num- bers of accessing users, and the binary search algorithm is used to calculate the assigned slot numbers of different priority users in map phase and reduce phase. Finally, a real time scheduling algorithm running in the MapReduce framework is implemented. Experimental results show that, compared with the tradi- tional FIFO and fair scheduling algorithm, the proposed scheduling algorithm can ensure the defined deadlines of different priority users more effectively when the user arrival rates and the task scales change.
出处
《计算机工程与科学》
CSCD
北大核心
2014年第12期2286-2295,共10页
Computer Engineering & Science
基金
国家自然科学基金资助项目(61300195)
辽宁省教育厅科学研究一般资助项目(L2013099)
中央高校基本科研业务费资助项目(N110323009)