摘要
在数据生产速度波动较大的场景,为了实时ETL资源利用更合理,提出基于稳定匹配的ETL弹性调度机制。预测数据源的数据生产速度,并计算满足预测值的消费数据速度;使用贪婪负载均衡算法,调整ETL服务个数使节点负载均衡;确定ETL操作匹配关系,使消费数据速度最大且代价最小。该调度机制将匹配问题转化为最小费用最大流问题,并提出基于Dicnic算法的改进算法。实验结果表明,该调度机制在资源使用方面具有优势。
In the case of large fluctuation of data production speed,in order to make real time ETL process resource utilization more reasonable,this paper proposes an ETL elastic scheduling mechanism based on stable matching.The data production speed of ETL data source was predicted,and the consumption data speed which needs to meet the predicted speed was calculated;greedy load balancing algorithm was adopted to adjust the number of ETL services to balance the load of nodes;we matched ETL operation relationship to make the consumption data speed the fastest and the cost the least.The ETL operation matching problem was transformed into the minimum cost maximum flow problem,and the improved algorithm based on Dicnic algorithm was proposed.The experimental results show that the scheduling mechanism has advantages in resource utilization.
作者
刘旋律
顾进广
Liu Xuanlü;Gu Jinguang(School of Computer and Technology,Wuhan University of Science and Technology,Wuhan 430065,Hubei,China;Key Laboratory of Intelligent Information Processing and Real-time Industrial System in Hubei Province(Wuhan University of Science and Technology),Wuhan 430065,Hubei,China;Institute of Big Data Science and Engineering,Wuhan University of Science and Technology,Wuhan 430065,Hubei,China;Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content,National Press and Publication Administration,Beijing 100038,China)
出处
《计算机应用与软件》
北大核心
2022年第2期266-273,共8页
Computer Applications and Software
基金
国家自然科学基金项目(61673304)。
关键词
实时ETL
弹性调度
稳定匹配
最小费用最大流
Real time ETL
Elastic scheduling
Stable matching
Minimum cost maximum flow