期刊文献+

基于任务分配和数据集副本的科学工作流数据布局 被引量:3

Scientific Workflow Dataset Layout Based on Task Assignment and Dataset Replicas
下载PDF
导出
摘要 云环境下科学工作流的数据布局成为当前工作流研究中的一个热点问题,对科学工作流中任务和数据之间多对多关系进行分析,可以发现不同数据布局方案在数据传输上的费用各不相同,在很大程度上影响工作流的运行成本。为降低科学工作流数据集传输费用,提出一种基于任务分配和数据集副本的科学工作流数据布局方法。该方法从任务分配开始,在定量计算任务依赖度的基础上进行任务分配,根据分配结果给出一个基于数据集副本的两阶段数据布局方法,以实现科学工作流运行中传输费用的优化。实例结果表明,与工作流层方法相比,该方法可以有效降低科学工作流的运行成本。 The Data Layout(DL)for Scientific Workflow(SW)in cloud environment becomes a hot issue in current workflow research.Considering the many-to-many relationship between tasks and data in scientific workflows,it can be found that the data transmission costs of different data layout schemes are different,which can greatly affect the running cost of workflow.In order to reduce the data transmission costs in SW,this paper proposes a SW DL method based on task assignment and dataset replicas.The method starts with task assignment,assigning these tasks based on quantitative calculation of task dependencies,and then proposes a two-stage DL method based on the dataset replicas according to the assignment result,so as to achieve the optimization of transmission costs in running scientific workflows.Sample results show that this method can effectively reduce the running cost of scientific workflows compared with the workflow layer method.
作者 尚蕾 刘茜萍 SHANG Lei;LIU Xiping(Jiangsu Key Laboratory of Big Data Security and Intelligent Processing,School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处 《计算机工程》 CAS CSCD 北大核心 2020年第5期122-130,138,共10页 Computer Engineering
基金 国家自然科学基金(61602260)。
关键词 云环境 科学工作流 任务分配 数据集副本 数据布局 传输费用 cloud environment Scientific Workflow(SW) task assignment dataset replicas Data Layout(DL) transmission cost
  • 相关文献

参考文献5

二级参考文献31

  • 1Deelman E,Chervenak A.Data management challenges of data-intensive scientific workflows//Proceedings of the IEEE International Symposium on Cluster Computing and the Grid(CCGRID).Lyon,France,2008:687-692.
  • 2Deelman E,Blythe J,Gil Y,Kesselman C,Mehta G,Patil S,Su M H,Vahi K,Livny M.Pegasus:Mapping scientific workflows onto the grid//Proceedings of the European Across Grids Conference(AxGrids).Nicosia,Cyprus,2004:11-20.
  • 3Ludascher B,Altintas I,Berkley C,Higgins D,Jaeger E,Jones M,Lee E A.Scientific workflow management and the Kepler system.Concurrency and Computation:Practice and Experience,2005,18(10):1039-1065.
  • 4Oinn T,Addis M,Ferris J,Marvin D,Senger M,Greenwood M,Carver T,Glover K,Pocock M R,Wipat A,Li P.Taverna:A tool for the composition and enactment of bioinformatics workflows.Bioinformatics,2004,20(17):3045-3054.
  • 5Ghemawat S,Gobioff H,Leung S T.The google file system.ACM SIGOPS Operating Systems Review,2003,37(5):29-43.
  • 6Wang L,Tao J,Kunze M,Castellanos A C,Kramer D,Karl W.Scientific cloud computing:Early definition and experience//Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications(HPCC).Dalian,China,2008:825-830.
  • 7Wieczorek M,Prodan R,Fahringer T.Scheduling of scientific workflows in the ASKALON grid environment.SIGMOD Record,2005,34(3):56-62.
  • 8Baru C,Moore R,Rajasekar A,Wan M.The SDSC storage resource broker//Proceedings of the IBMCentre for Advanced Studies Conference.Toronto,Canada,1998:1-12.
  • 9Churches D,Gombas G,Harrison A,Maassen J,Robinson C,Shields M,Taylor I,Wang I.Programming scientific and distributed workflow with Triana services.Concurrency and Computation:Practice and Experience,2006,18:1021-1037.
  • 10Chervenak A,Deelman E,Foster I,Guy L,Hoschek W,Iamnitchi A,Kesselman C,Kunszt P,Ripeanu M,Schwartzkopf B,Stockinger H,Stockinger K,Tierney B.Giggle:A framework for constructing scalable replica location services//Proceedings of the ACM/IEEE Conference on Supercomputing.Baltimore,Maryland,USA,2002:1-17.

共引文献139

同被引文献25

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部