期刊文献+

基于logistic回归模型的Hadoop本地任务调度优化算法 被引量:7

Hadoop local tasks scheduling optimization algorithm based on logistic regression model
下载PDF
导出
摘要 当一个工作节点有多个本地任务可执行时,默认情况下,调度器都是按照任务被发现的先后顺序来进行执行,效率低下。为了优化对本地任务的调度,提出了一种基于机器学习的Hadoop本地任务调度优化算法。选取定义与任务相关的特征向量,然后基于logistic回归模型的机器学习方式得到各向量的作用权值,将任务进行优先级排序,并通过过载规则不断更新模型。通过实验证明,提出的算法在改善map任务的数据本地性的同时,降低了作业运行时间。 For a task Tracker has multiple local tasks available,by default,the scheduler executes those tasks in succession with the order of the tasks to be found,this is inefficient. In order to optimize the local tasks scheduling,this paper presented Hadoop local tasks scheduling optimization algorithm based on machine learning. First,it selected and defined related feature vectors of the local tasks. Then,based on the way of machine learning with logistic regression model,it trained these vectors to get the weight of each vector to decide the task priority,and updated the model constantly by the overload rules. The experimental results show that the proposed algorithm improves map task data locality,at the same time it reduces job running time.
出处 《计算机应用研究》 CSCD 北大核心 2017年第3期727-729,755,共4页 Application Research of Computers
基金 国家公益性科研专项项目(201310162 201210022) 连云港科技支撑计划资助项目(SH1110)
关键词 HADOOP MAPREDUCE 本地调度 任务优先级 过载规则 LOGISTIC回归模型 Hadoop MapReduce local tasks scheduling task priority overload rules logistic regression model
  • 相关文献

参考文献7

二级参考文献120

  • 1摆万奇,张永民,阎建忠,张镱锂.大渡河上游地区土地利用动态模拟分析[J].地理研究,2005,24(2):206-212. 被引量:93
  • 2许月卿.土地利用对地下水位下降的影响——以河北平原为例[J].地理研究,2005,24(2):222-228. 被引量:36
  • 3王长科,吕宪国,蔡祖聪,罗勇.土地利用方式对白浆土氧化甲烷的影响[J].地理研究,2006,25(2):335-341. 被引量:7
  • 4邓维斌,王国胤,王燕.基于Rough Set的加权朴素贝叶斯分类算法[J].计算机科学,2007,34(2):204-206. 被引量:43
  • 5Dean J,Ghemawat S. MapReduee: Simplified data processing on large elusters[C]///OSDI' 04: Sixth Symposium on Operating System Design and Implementation. 2004:137-150.
  • 6Zaharia M, Borthakur D, Sarma J S. Job seheduleing for multiuser mapreduce clusters[C]//Proceedings of the 5th European Conference IEEE. 2009 : 145-161.
  • 7Matei Zaharia, Dhruba Borthakur and Joydeep Sen Sarma. Delay scheduling:a simple technique for achieving locality and fairness in cluster scheduleing[C]// EuroSys ' 10: Proceedings of the 5th European conference on Computer systems. 2010:265-278.
  • 8Polo J, de Nadal D, Carrera D. Adaptive Task Scheduling for MultiJob MapReduce Environments[C] // Proceedings of the 2010 Eighth International Conference on Grid and Cooperative Computing IEEE. 2010:326-332.
  • 9Thomas Sandholm and Kevin Lai. Dynamic proportional share scheduling in hadoop[C]//JSSPP ' 10: 15th Workshop on Job Scheduling Strategies for Parallel Processing. 2010:110-131.
  • 10Polo J, Carrera D, Becerra Y. Performance-driven task co-scheduling for rnapr- educe environrnents[C]//Network Operations and Management Symposium(NOMS), IEEE. 2010 : 373-380.

共引文献191

同被引文献60

引证文献7

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部