面向Flink的负载均衡任务调度算法的研究与实现被引量：6

Research and implementation of a Flink-oriented load balancing task scheduling algorithm

下载PDF

导出

摘要 Apache Flink是现在主流的大数据分布式计算引擎之一,其中任务调度问题是分布式计算系统中的关键问题。由于集群的异构性以及不同算子复杂度不同,大数据计算系统Flink中不可避免地会出现负载不均的情况,针对这种问题,提出了基于资源反馈的负载均衡任务调度算法RFTS。通过实时资源监控、区域划分和基于人工萤火虫优化的任务调度算法3个模块,把负载过重的机器中处于等待状态的任务分配给负载较轻的机器,来实现集群的负载均衡,提高系统集群利用率和执行效率。最后通过基于TPC-C和TPC-H数据集的实验结果表明,RFTS算法从执行时间和吞吐量2个方面有效提升了Apache Flink计算系统的性能。 Apache Flink is one of the mainstream big data distributed computing engines,and task scheduling is a key issue in distributed computing systems.Due to the heterogeneity of clusters and the different complexity of operators,uneven load will inevitably appear in the big data computing system Flink.To solve this problem,a load balancing task scheduling algorithm based on resource feedback,named RFTS,is proposed.Through the three modules(real-time resource monitoring,area division,and task scheduling algorithm based on glowworm swarm optimization),the tasks in the waiting queue in the over-loaded machine are allocated to the lighter-loaded machines,so as to reduce the load unevenness of the entire cluster and improve the cluster utilization and execution efficiency of the system.Finally,through the experimental verification based on the TPC-C and TPC-H datasets,the results show that the load balancing task scheduling algorithm based on resource feedback(RFTS)can effectively improve the performance of the Apache Flink computing system in terms of execution time and throughput.

作者李文佳史岚季航旭罗意彭 LI Wen-jia;SHI Lan;JI Hang-xu;LUO Yi-peng(College of Computer Science and Engineering,Northeastern University,Shenyang 110169;School of Software,Liaoning University of Technology,Jinzhou 121000,China)

机构地区东北大学计算机科学与工程学院辽宁工业大学软件学院

出处《计算机工程与科学》 CSCD 北大核心 2022年第7期1141-1151,共11页 Computer Engineering & Science

基金科技部重点研发项目(2018YFB1004402)。

关键词 Apache Flink 基于资源反馈的负载均衡任务调度算法实时资源监控区域划分人工萤火虫优化算法 Apache Flink load balancing task scheduling algorithm based on resource feedback real-time resource monitoring area division glowworm swarm optimization algorithm

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1Dewen WANG,Fangfang ZHOU,Jiangman LI.Cloud-based parallel power flow calculation using resilient distributed datasets and directed acyclic graph[J].Journal of Modern Power Systems and Clean Energy,2019,7(1):65-77. 被引量：4
2SHU Wanneng ZHENG Shijue.A Parallel Genetic Simulated Annealing Hybrid Algorithm for Task Scheduling[J].Wuhan University Journal of Natural Sciences,2006,11(5):1378-1382. 被引量：12

二级参考文献22

1刘洋,周家启,谢开贵,赵渊,陈炜俊,胡博.基于Beowulf集群的大规模电力系统方程并行PCG求解[J].电工技术学报,2006,21(3):105-111. 被引量：16
2Zhang Jiangshe,Xu Zongben,Liang Yi.Global Annealing Genetic Algorithm and Its Convergence Well Necessary Condition[].Science in China.1997
3Maheswaran M.Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing System [ C][].The th IEEE Heterogeneous Computing Workshop.1999
4Foster I,Kesselman C,Tuecke S.The Anatomy of the Grid: Enabling Scalable Virtual Organizations [ J][].International Journal Supercomputing Application.2001
5Taura K,Chien A.A Heuristic Algorithm for Mapping Communicating Tasks on Heterogeneous Resources[].th Heterogeneous Computing Workshop.2005
6Vincenzo D M.Schduling in a Grid Computing Enviroment Using Genetic Algorithm[].Marco Mililoti the th Int’ Parallel and Distributed Processing Symp(IPDPS).2002
7Vincenzo D M,Mililotti M.Sub-optimal Scheduling in a Grid Using Genetic Algorithm [ J ][].Parallel Computing.2004
8Abraham A,Buyya R.Nature’s Heuristics for Scheduling Jobs on Computational Grids [ C]//[].The th Int’ Conf on Advanced Computing and Communications ( ADCOM ).2000
9Zheng Shijue,Shu Wanneng,Chen Guangdong.A Load Balanced Method Based on Campus Grid[C]//[].International Symposium on Communications and Information Technologies (ISCIT ).2005
10Shu Wanneng,Zheng Shijue.A Real-Course-Based Load Balanced Algorithm of VOD Cluster [C]//[].International Symposium on Computer Science and Technology (ISCST ).2005

共引文献14

1Saeid POUYAFAR,Mehrdad TARAFDAR HAGH,Kazem ZARE.Circuit-theory-based method for transmission fixed cost allocation based on game-theory rationalized sharing of mutual-terms[J].Journal of Modern Power Systems and Clean Energy,2019,7(6):1507-1522. 被引量：2
2符保龙,黄崇争.基于免疫遗传退火算法的Web关联规则挖掘方法[J].计算机应用研究,2009,26(2):478-480. 被引量：3
3童小念,舒万能,李子茂.异构多处理机系统的负载均衡与任务调度[J].光学精密工程,2007,15(12):1969-1973. 被引量：2
4舒万能.基于量子遗传算法的校园网格作业调度[J].计算机工程,2008,34(7):191-193. 被引量：3
5张飞,陈涛,黄景廉.基于可信度策略的校园网格作业调度算法[J].东南大学学报（自然科学版）,2008,38(A01):181-184.
6付国瑜,周敏.基于OGSA的校园网格和作业服务研究[J].通信技术,2009,42(3):244-246. 被引量：1
7CHEN Hao,ZOU Beiji,BIAN Naizheng.Optimization of Web Search Engine and Its Application to Web Mining[J].Wuhan University Journal of Natural Sciences,2009,14(2):115-118. 被引量：1
8陈鹤年,严丽丽,李俊青.一种基于模糊策略的拥塞控制算法在校园网格中的应用[J].武汉职业技术学院学报,2009,8(3):73-75. 被引量：1
9阮光册,Ah-Hwwe Yu.基于兴趣度策略的启发式Web挖掘算法[J].计算机工程与应用,2009,45(35):148-150. 被引量：3
10曹磊.一种基于Qos的网格资源调度策略[J].淮北煤炭师范学院学报（自然科学版）,2009,30(4):57-59.

同被引文献66

1王郑合,王锋,邓辉,柳翠寅,张晓丽.一种优化的Kafka消费者/客户端负载均衡算法[J].计算机应用研究,2017,34(8):2306-2309. 被引量：20
2赵莉.基于支持向量机的云计算资源负载预测模型[J].南京理工大学学报,2018,42(6):687-692. 被引量：21
3孙婷婷,黄皓,王嘉伦,翁楚良.面向CPU-GPU异构系统的数据分析负载均衡策略[J].计算机工程与科学,2019,41(3):417-423. 被引量：12
4吴璨,王小宁,肖海力,曹荣强,赵一宁,迟学斌.分布式消息系统研究综述[J].计算机科学,2019,46(B06):1-5. 被引量：36
5胡亚红,盛夏,毛家发.资源不均衡Spark环境任务调度优化算法研究[J].计算机工程与科学,2020,42(2):203-209. 被引量：14
6徐超,吴波,姜丽丽,金熠波,张胜.云—边缘系统中跨域大数据作业调度技术研究[J].计算机应用研究,2020,37(3):754-758. 被引量：10
7林涛,冯竞凯,郝章肖,黄少群.基于组合预测模型的云计算资源负载预测研究[J].计算机工程与科学,2020,42(7):1168-1173. 被引量：16
8谢文康,樊卫北,张玉杰,徐鹤,李鹏.ENLHS:一种基于抽样的Kafka自适应调优方法[J].计算机科学,2020,47(8):119-126. 被引量：4
9李梓杨,于炯,王跃飞,卞琛,蒲勇霖,张译天,刘宇.Flink环境下基于负载预测的弹性资源调度策略[J].通信学报,2020,41(10):92-108. 被引量：5
10牛道安,柯在田,刘维桢,李红艳,赵钢,刘秀波.高速铁路基础设施检测监测体系框架研究[J].中国铁路,2020(10):9-17. 被引量：23

引证文献6

1艾力卡木·再比布拉,甄妞,黄山,段晓东.基于深度学习的容器化Flink上下游负载均衡策略研究[J].大连民族大学学报,2023,25(1):47-52. 被引量：1
2刘晓彤,邓敦杰.基于优先级排列的内存数据库负载均衡仿真[J].计算机仿真,2024,41(2):317-320.
3陈经涛,朱大伟,钱琦.基于Kent映射的数字集群动态负载均衡算法研究[J].吉林大学学报（信息科学版）,2024,42(2):326-332.
4朱晓丽,高鹏.基于启发式算法的计算机异构大数据跨源调度方法[J].新乡学院学报,2024,41(6):23-27.
5王鹏,杨国栋.面向嵌入式多核系统的缓存调度算法优化[J].吉林大学学报（工学版）,2024,54(8):2282-2287.
6危倩,杨森,赵一馨,姚莉.高吞吐量基础设施检测数据实时处理技术[J].铁道建筑,2024,64(9):47-52.

二级引证文献1

1李聪.考虑负载均衡的物联网节点传输速率控制方法[J].物联网技术,2024,14(7):71-73.

1谭巍.基于Apache Flink数据仓库的元数据管理[J].金融科技时代,2022,30(8):72-75. 被引量：1
2李书缘,季与点,史鼎元,廖旺冬,张利鹏,童咏昕,许可.面向多方安全的数据联邦系统[J].软件学报,2022,33(3):1111-1127. 被引量：4
3Huichao DUAN,Huiqi HU,Weining QIAN,Aoying ZHOU.Incremental join view maintenance on distributed log-structured storage[J].Frontiers of Computer Science,2021,15(4):105-120. 被引量：1
4张逸风,佟国香,刘军,屈亚宁.基于蚁群遗传混合算法改进的连接查询研究[J].计算机工程与科学,2021,43(12):2272-2280. 被引量：1
5钱文渊,荆一楠,王晓阳,吴振环.面向多表连接查询优化的基数估计方法[J].计算机工程,2022,48(6):167-173.

计算机工程与科学

2022年第7期

浏览历史

内容加载中请稍等...

面向Flink的负载均衡任务调度算法的研究与实现被引量：6

参考文献2

二级参考文献22

共引文献14

同被引文献66

引证文献6

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

面向Flink的负载均衡任务调度算法的研究与实现 被引量：6

参考文献2

二级参考文献22

共引文献14

同被引文献66

引证文献6

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

面向Flink的负载均衡任务调度算法的研究与实现被引量：6