Data streaming applications, usually composed of sequential/parallel data processing tasks organized as a workflow, bring new challenges to workflow scheduling and resource allocation in grid environments. Due to the ...Data streaming applications, usually composed of sequential/parallel data processing tasks organized as a workflow, bring new challenges to workflow scheduling and resource allocation in grid environments. Due to the high volumes of data and relatively limited storage capability, resource allocation and data streaming have to be storage aware. Also to improve system performance, the data streaming and processing have to be concurrent. This study used a genetic algorithm (GA) for workflow scheduling, using on-line measurements and predictions with gray model (GM). On-demand data streaming is used to avoid data overflow through repertory strategies. Tests show that tasks with on-demand data streaming must be balanced to improve overall performance, to avoid system bottlenecks and backlogs of intermediate data, and to increase data throughput for the data processing workflows as a whole.展开更多
基金Supported by the National Natural Science Foundation of China(No. 60803017)the National High-Tech Research and Development (863) Program of China (Nos. 2006AA10Z237,2007AA01Z179, and 2008AA01Z118)+1 种基金the Scientific Research Foundation for the Returned Overseas Chinese Scholars,State Education Ministrythe FIT Foundation of Tsinghua University
文摘Data streaming applications, usually composed of sequential/parallel data processing tasks organized as a workflow, bring new challenges to workflow scheduling and resource allocation in grid environments. Due to the high volumes of data and relatively limited storage capability, resource allocation and data streaming have to be storage aware. Also to improve system performance, the data streaming and processing have to be concurrent. This study used a genetic algorithm (GA) for workflow scheduling, using on-line measurements and predictions with gray model (GM). On-demand data streaming is used to avoid data overflow through repertory strategies. Tests show that tasks with on-demand data streaming must be balanced to improve overall performance, to avoid system bottlenecks and backlogs of intermediate data, and to increase data throughput for the data processing workflows as a whole.