期刊文献+

基于Flink的任务调度策略 被引量:6

Task scheduling strategy based on Flink environment
下载PDF
导出
摘要 针对大数据流式计算平台拓扑中因各关键节点上任务间不同类型的通信方式导致的通信开销较大问题,提出一种Flink环境下的任务调度策略。通过各任务间数据流大小确定拓扑边权重,将有向无环图转化为拓扑关键路径模型,在保证关键路径上节点负载差异较小的同时,最小化关键任务的节点间通信开销。实验结果表明,该算法与Flink平台现有的任务调度策略相比,在WordCount和TwitterSentiment作业执行过程中计算平均时延降低了13.09%,有效提升了系统性能。 Focusing on reducing the large communication overhead caused by different communication types among tasks on key nodes in the big data streaming platform topology,to alleviate this problem,a task scheduling strategy in Flink environment was proposed.The weight of topological edges was determined by the size of data flow among individual tasks,and the directed acyclic graph was transformed into a topological critical path model.The inter-node communication overhead of critical tasks was minimized while ensuring little difference of node load on the critical path.Experimental results show that compared with the existing task scheduling strategies of Flink platform,the proposed strategy can improve the performance of system by an average of 13.09%when calculating latencies of the WordCount and TwitterSentiment operations.
作者 何贞贞 于炯 李梓杨 国冰磊 HE Zhen-zhen;YU Jiong;LI Zi-yang;GUO Bing-lei(Software College,Xinjiang University,Urumqi 830008,China;College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China)
出处 《计算机工程与设计》 北大核心 2020年第5期1280-1287,共8页 Computer Engineering and Design
基金 国家自然科学基金项目(61862060、61462079、61562086、61562078) 新疆维吾尔自治区自然科学基金项目(2017D01A20)。
关键词 流式计算 Flink 关键路径 通信开销 任务调度 stream computing Flink critical path communication cost task scheduling
  • 相关文献

参考文献7

二级参考文献110

  • 1维克托·迈尔-舍恩伯格.大数据时代:生活、工作与思维的大变革[M].杭州:浙江人民出版社,2012(12).
  • 2Ghemawat S, Gobioff H, Leung S T. The Google file system[C]//proc of the 19th ACM Symp on Operating Systems Principles. New York: ACM, 2003: 29-43.
  • 3Dean J, Ghemawat S. Mapreduce , Simplified data processing on large clusters[C] jjProc of the 6th Symp on Operating System Design and Implementation. San Francisco: USE NIX Association, 2004: 137-150.
  • 4Schreier U, Pirahesh H, Agrawal R, et al. Alert: An architecture for transforming a passive DBMS into an active DBMS[C] jjproc of the 17th Int Conf on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann, 1991: 469- 478.
  • 5Kao B, Garcia-Molina H. An overview of real-time database systems[GJ //Real Time Computing. Berlin: Springer, 1994: 261-282.
  • 6Belkin N J, Croft W B. Information filtering and information retrieval: Two sides of the same coin?[J]. Communications of the ACM, 1992, 35(12): 29-38.
  • 7Chandrasekaran S, Cooper 0, Deshpande A, et al. Telegraphcq , Continuous dataflow processing for an uncertain world[C/OL]//Proc of the 1st Biennial Conf on Innovative Data Systems Research. 2003[2014-11-11]. http://cidrdb. org/2003Proceedings. zip.
  • 8Arasu A, Babcock B, Babu S, et al. Stream: The stanford data stream management system COL]. 2004[2014-11-11]. http://ilpubs. stanford. edu , 8090/641/1/2004-20. pdf.
  • 9Cherniack M, Balakrishnan H, Balazinska M. Scalable distributed stream processing[C/OL]//Proc of the 1st Biennial Conf on Innovative Data Systems Research. 2003[2014-11-11]. http://cidrdb.org/2003Proceedings.zip.
  • 10Shah M A, Hellerstein J M, Brewer EA. Highly-available, fault-tolerant, parallel dataflows[C]//Proc of the ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2004: 827-838.

共引文献414

同被引文献71

引证文献6

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部