In recent years,the demand for real-time data processing has been increasing,and various stream processing systems have emerged.When the amount of data input to the stream processing system fluctuates,the computing re...In recent years,the demand for real-time data processing has been increasing,and various stream processing systems have emerged.When the amount of data input to the stream processing system fluctuates,the computing resources required by the stream processing job will also change.The resources used by stream processing jobs need to be adjusted according to load changes,avoiding the waste of computing resources.At present,existing works adjust stream processing jobs based on the assumption that there is a linear relationship between the operator parallelism and operator resource consumption(e.g.,throughput),which makes a significant deviation when the operator parallelism increases.This paper proposes a nonlinear model to represent operator performance.We divide the operator performance into three stages,the Non-competition stage,the Non-full competition stage,and the Full competition stage.Using our proposed performance model,given the parallelism of the operator,we can accurately predict the CPU utilization and operator throughput.Evaluated with actual experiments,the prediction error of our model is below 5%.We also propose a quick accurate auto-scaling(QAAS)method that uses the operator performance model to implement the auto-scaling of the operator parallelism of the Flink job.Compared to previous work,QAAS is able to maintain stable job performance under load changes,minimizing the number of job adjustments and reducing data backlogs by 50%.展开更多
基金supported by the National Key Research and Development Program of China(2020YFB1506703)the National Natural Science Foundation of China(Grant No.62072018)+1 种基金the State Key Laboratory of Software Development Environment(SKLSDE-2021ZX-06)the Fundamental Research Funds for the Central Universities.
文摘In recent years,the demand for real-time data processing has been increasing,and various stream processing systems have emerged.When the amount of data input to the stream processing system fluctuates,the computing resources required by the stream processing job will also change.The resources used by stream processing jobs need to be adjusted according to load changes,avoiding the waste of computing resources.At present,existing works adjust stream processing jobs based on the assumption that there is a linear relationship between the operator parallelism and operator resource consumption(e.g.,throughput),which makes a significant deviation when the operator parallelism increases.This paper proposes a nonlinear model to represent operator performance.We divide the operator performance into three stages,the Non-competition stage,the Non-full competition stage,and the Full competition stage.Using our proposed performance model,given the parallelism of the operator,we can accurately predict the CPU utilization and operator throughput.Evaluated with actual experiments,the prediction error of our model is below 5%.We also propose a quick accurate auto-scaling(QAAS)method that uses the operator performance model to implement the auto-scaling of the operator parallelism of the Flink job.Compared to previous work,QAAS is able to maintain stable job performance under load changes,minimizing the number of job adjustments and reducing data backlogs by 50%.