摘要
随着大数据与人工智能技术的飞速发展,高性能,实时性的流式计算系统逐渐取代传统基于数据仓库的批量计算系统.Apache storm作为一款开源,高容错,实时处理的分布式大数据流式计算平台,支持任务平均分配策略,单机任务指定策略等多种任务分配方案.当任务拓扑结构中存在多个任务时,且集群中只有某些机器支持某一任务执行时,传统的任务调度方法只能实现将单一的任务分配给单一指定的机器,使得整个集群的资源没有充分的利用.通过调整任务调度策略,获得满足条件的机器队列,查看机器队列中可用工作节点,将指定任务均匀分配给可用工作节点,其他任务仍通过默认策略分配给集群中的剩余机器,实现多任务的分组调度策略.
As big data and artificial intelligence technologies are booming,high-performance,real-time streaming computing systems are gradually replacing traditional batch computing systems based on data warehouses.As an opensource distributed big-data streaming computing platform that is highly fault-tolerant and can realize real-time processing,Apache storm supports a variety of task distribution schemes such as average task distribution strategy and single-machine task assignment strategy.When there are multiple tasks in the task topology and only certain machines in the cluster support the execution of a certain task,the traditional task scheduling method can only allocate a single task to a single designated machine,failing to make best use of resources in the entire cluster.By the adjustment to the task scheduling strategy,the eligible machine queue is obtained.Then,the assigned tasks are evenly distributed to available work nodes in the machine queue,and other tasks are distributed to the remaining machines in the cluster through the default strategy.In this way,multi-task group scheduling strategy can be achieved.
作者
王中华
柴小丽
WANG Zhong-Hua;CHAI Xiao-Li(The 32nd Research Institute of China Electronics Technology Group Corporation,Shanghai 201808,China)
出处
《计算机系统应用》
2021年第2期250-254,共5页
Computer Systems & Applications