摘要
云计算环境下,数据密集型科学工作流的数据文件在多数据中心间的合理布局,对科学工作流的执行效率具有很大的影响。根据科学工作流各数据集之间的依赖关系,并聚焦于运行科学工作流的各数据中心的处理能力差异和网络性能差异,提出一种可提高科学工作流执行性能的数据布局以及数据布局敏感的任务调度策略。分析和实验表明,上述策略可有效减少科学工作流运行时跨数据中心的数据传输,降低科学工作流的运行时间,从而提高科学工作流整体运行效率。
In cloud computing environment,for a data-intensive scientific workflow,the rational distribution of its task data files in multiple cloud data centers will largely impact its execution efficiency. In this paper,based on the dependence of scientific workflow task data,a data placement strategy and the related scheduling approach for scientific workflows in cloud was proposed,which can improve the workflow execution efficiency. The processing capacity differences of data centers and the bandwidth differences among them were also taken into account. The analysis and simulation show that the strategy can observably reduce the data transfer across data centers and the time cost of scientific workflow,and hence improve the whole efficiency of scientific workflows.
出处
《计算机仿真》
CSCD
北大核心
2015年第3期421-425,437,共6页
Computer Simulation
基金
国家自然科学基金项目(61462076)
甘肃省科技支撑计划项目(1104GKCA023)
甘肃省科技攻关项目(1208RJZA134)
关键词
云计算
科学工作流
数据依赖
数据布局
任务调度
Cloud computing
Scientific workflow
Data dependence
Data placement
Task scheduling