摘要
Spark作为流行的分布式数据处理框架,其资源的调度方式和资源的利用率直接关系到集群计算处理的效率和速度。针对Spark资源调度问题,在Spark自身考虑的资源因素内存和空余核数下,提出新的调度算法。算法通过实时监视工作节点资源利用情况,增加对节点CPU处理速度和CPU剩余利用率的考虑,重新调度与分配资源,为Spark作为Web服务高并发请求、低延迟响应提供优化,还可以减少传统方式没有考虑的资源因素导致出现的资源利用倾斜现象,提高资源的利用率。实验表明,改进的资源调度算法有较好的效果。
The processing speed of Spark which is a big data processing structure is highly influenced by resource scheduling modes and whether we can utilize the resource sufficiently. Taking memories and the number of free cores into consideration, we propose a new scalable resource scheduling method. In this method, we monitor the resource utilization of nodes in real time and examine CPU processing speed and CPU residual utilization. This method can be used to optimize Spark Web service so as to meet the requirements of high concurrent request and low latency response and efficiently reduce the imbalance of resource utilization, thus improving resource utilization. Experimental results show that our method can obtain better results.
出处
《计算机工程与科学》
CSCD
北大核心
2016年第8期1550-1556,共7页
Computer Engineering & Science
基金
国家自然科学基金(61272420)