摘要
针对Flink平台下先来先服务任务调度算法忽略任务资源需求与节点可用资源之间的关系,导致不同节点任务负载不均,从而影响系统吞吐量的问题,提出了基于Flink流式计算环境下资源感知任务调度策略.首先,以GlobalState模块监测的资源数据为依据,考虑任务资源需求与节点可用资源间的匹配关系,提出一种任务选择算法与节点选择算法选取待执行任务与最佳调度节点;其次,通过资源感知调度策略把待执行任务调度至最佳调度节点;最后,通过实验验证算法的有效性.实验结果表明,相比Flink平台现有的调度算法,本文所提出的算法在大数据基准测试WordCount以及TeraSort下的吞吐量平均提高了约29.32%和35.86%.
The first service scheduling task scheduling algorithm for Flink platform ignores the relationship between the task resource requirements and the available resources of the nodes,which leads to the uneven load of the node tasks,thereby affecting the system throughput.A resource-aware task scheduling strategy based on Flink streaming computing environment is proposed.Firstly,based on the resource data monitored by the GlobalState module,considering the matching relationship between the task resource requirements and the nodes available resources,a task selection algorithm and node selection algorithm are proposed to select the task to be executed and the optimal scheduling node.Secondly,the task to be executed is scheduled to the optimal scheduling node through resource-aware scheduling strategy.Finally,experiments verify the effectiveness of the algorithm.Experimental results show that compared with the existing scheduling algorithms of Flink platform,the throughput of the proposed algorithm in the big data benchmark tests WordCount and TeraSort increases by an average of 29.32%and 35.86%,respectively.
作者
汪丽娟
钱育蓉
张猛
英昌甜
赵燚
WANG Li-juan;QIAN Yu-rong;ZHANG Meng;YING Chang-tian;ZHAO Yi(College of Software,Xinjiang University,Urumqi 830046,China;Postdoctoral Station of Electrical Discipline,Xinjiang University,Urumqi 830046,China;College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China)
出处
《东北师大学报(自然科学版)》
CAS
北大核心
2020年第2期66-72,共7页
Journal of Northeast Normal University(Natural Science Edition)
基金
国家自然科学基金资助项目(61562086,61462079)
新疆维吾尔自治区教育厅创新团队项目(XJEDU2016S035)
新疆大学博士科研启动基金资助项目(BS150257)
2017年度自治区高校科研计划项目(XJEDU2017T002).