摘要
当一个工作节点有多个本地任务可执行时,默认情况下,调度器都是按照任务被发现的先后顺序来进行执行,效率低下。为了优化对本地任务的调度,提出了一种基于机器学习的Hadoop本地任务调度优化算法。选取定义与任务相关的特征向量,然后基于logistic回归模型的机器学习方式得到各向量的作用权值,将任务进行优先级排序,并通过过载规则不断更新模型。通过实验证明,提出的算法在改善map任务的数据本地性的同时,降低了作业运行时间。
For a task Tracker has multiple local tasks available,by default,the scheduler executes those tasks in succession with the order of the tasks to be found,this is inefficient. In order to optimize the local tasks scheduling,this paper presented Hadoop local tasks scheduling optimization algorithm based on machine learning. First,it selected and defined related feature vectors of the local tasks. Then,based on the way of machine learning with logistic regression model,it trained these vectors to get the weight of each vector to decide the task priority,and updated the model constantly by the overload rules. The experimental results show that the proposed algorithm improves map task data locality,at the same time it reduces job running time.
出处
《计算机应用研究》
CSCD
北大核心
2017年第3期727-729,755,共4页
Application Research of Computers
基金
国家公益性科研专项项目(201310162
201210022)
连云港科技支撑计划资助项目(SH1110)