期刊文献+

面向回填优化的作业时长预测 被引量:5

Runtime Prediction of Jobs for Backfilling Optimization
下载PDF
导出
摘要 高性能计算集群通常使用先来先服务等传统的作业调度方法,它具有良好的公平性,并且实现简单,但容易产生闲置的资源碎片.针对上述问题,一种的策略是使用回填,利用一些短时间小作业来填补系统等待期间的空闲资源碎片.但好的回填方法通常需要知道作业预期的运行时间,而用户或者不愿意提供作业预期运行时间,或者倾向于提供比实际运行时间更长的预期时间以避免作业被系统终止,因此我们有必要自行预测作业的运行时间. VASP是国内应用最普及的高性能计算应用软件之一,本文通过分析VASP作业特性,解析并抽取相应的作业特征集,提出一种基于贝叶斯的二次预测模型IRPA,对VASP作业进行运行时长的预测,最后进一步提出基于径向基网络分支及贝叶斯分类的混合预测模型BRBF,并且利用我校TC4600平台上的VASP作业数据集进行验证.实验结果和其他几个基本方法进行对比,表明IRPA以及BRBF的有效性以及在粗粒度下具有的较高预测准确率. The most common scheduling strategy used on high performance computing systems is First Come First Server( FCFS). The FCFS strategy has good fairness and is simple and practical,but it may generate idle resource fragments. The backfilling algorithm is one of the wildly used method to improve the utility of the system. It let some short runtime small jobs run ahead to fill the blank resources which are waiting for large jobs. The backfilling algorithm depend on the knowledge of jobs runtime before they really run.However,users may not willing to provide their jobs runtime or the provided runtime are much longer than actually runtime to avoid their jobs be terminated by the system. Therefore,it is necessary to predict the jobs runtime based on the jobs properties. The Vienna Ab initio Simulation Package( VASP) is one of the most popular high performance computing applications. We extracted the corresponding job attribute sets by analyzing the characteristics of VASP,and presented a job runtime prediction method based on Bayesian model: IRPA,which is used to predict the running time of VASP jobs. We also proposed a hybrid method based on radial basis network and Bayesian model: BRBF. These two models are verified the two models by using the history data of VASP jobs on TC4600 platform in supercomputer center of University of Science and Technology of China( USTC). Compared to some other classical methods,our two methods showthe better effectiveness and higher prediction accuracy at a coarse granularity.
作者 吴桂宝 沈瑜 张文帅 廖莎莎 王琦琦 李京 WU Gui-bao;SHEN Yu;ZHANG Wen-shuai;LIAO Sha-sha;WANG Qi-qi;LI Jing(University of Science & Technology of China,Hefei 230026,China;Supercomputing Center of University of Science & Technology China,Hefei 230026,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2019年第1期6-12,共7页 Journal of Chinese Computer Systems
基金 国家重点研究和发展专项项目(2016YFB0201402)资助
关键词 高性能计算 资源碎片 回填 二次预测 径向基 high performance computing resource fragmentation backfilling add-on prediction radial basis
  • 相关文献

参考文献1

二级参考文献15

  • 1EI-Ghazawi T, Gaj K,Alexandridis N,et al.A performance study of job management systems[J].Concurrency and Computation: P racti c e & Experience, John Wiley & Son, 2004,16( 13): 1229-1246.
  • 2Zhou D,Lo V.Wave scheduler:Scheduling for faster turnaround time in peer-based desktop grid systems[C].Boston,MA,USA: Proc of 11th Workshop on Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science 3834. Berlin: Springer,2005.
  • 3Kondo D,Chien A,Casanova H.Resource management for rapid application turnaround on enterprise desktop grids [C].Proc of Super Computing Conference,2004.
  • 4Kondo D,Chien A,Casanova H.Scheduling task parallel applica- tions for rapid application turnaround on enterprise desktop grids [J].Journal of Grid Computing,2007,5(4):379-405.
  • 5R脚本编程软件[OL],http://www.r-yser.org/.2010.
  • 6Che X,Hu L,Guo D,et al.Information service prototype system for run-time prediction of grid applications,pervasive computing and applications[C].2nd International Conference on Pervasive Computing and Applications,2007:530-535.
  • 7Li W, Delugach H. Software metrics and application domain complexity[C].Hong Kong:IEEE Proc of Asia Pacific Software Engineering Conference & International Computer Science Conference, 1997:513-514.
  • 8Casanova H.Simgrid:a toolkit for the simulation of application scheduling [C]. Brisbane, Australia: IEEE International Sympo- sium on Cluster Computing and the Grid,2001.
  • 9Casanova H, Legrand A, Marchal L. Scheduling distributed ap- plications:the simgrid simulation framework[C].3rd IEEE Int'l Symposium on Cluster Computing and the Grid,2003.
  • 10Buyya R,Murshed M.GridSim:A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing [J].Concurrency and Computation:Practice and Experience,2002,14(13/15): 1175- 1220.

共引文献3

同被引文献14

引证文献5

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部