摘要
针对大数据流式计算平台拓扑中因各关键节点上任务间不同类型的通信方式导致的通信开销较大问题,提出一种Flink环境下的任务调度策略。通过各任务间数据流大小确定拓扑边权重,将有向无环图转化为拓扑关键路径模型,在保证关键路径上节点负载差异较小的同时,最小化关键任务的节点间通信开销。实验结果表明,该算法与Flink平台现有的任务调度策略相比,在WordCount和TwitterSentiment作业执行过程中计算平均时延降低了13.09%,有效提升了系统性能。
Focusing on reducing the large communication overhead caused by different communication types among tasks on key nodes in the big data streaming platform topology,to alleviate this problem,a task scheduling strategy in Flink environment was proposed.The weight of topological edges was determined by the size of data flow among individual tasks,and the directed acyclic graph was transformed into a topological critical path model.The inter-node communication overhead of critical tasks was minimized while ensuring little difference of node load on the critical path.Experimental results show that compared with the existing task scheduling strategies of Flink platform,the proposed strategy can improve the performance of system by an average of 13.09%when calculating latencies of the WordCount and TwitterSentiment operations.
作者
何贞贞
于炯
李梓杨
国冰磊
HE Zhen-zhen;YU Jiong;LI Zi-yang;GUO Bing-lei(Software College,Xinjiang University,Urumqi 830008,China;College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China)
出处
《计算机工程与设计》
北大核心
2020年第5期1280-1287,共8页
Computer Engineering and Design
基金
国家自然科学基金项目(61862060、61462079、61562086、61562078)
新疆维吾尔自治区自然科学基金项目(2017D01A20)。