摘要
分布式内存计算平台Spark是海量数据处理领域的最新技术进展。动态资源分配下Spark可根据应用的负载情况动态地追增、关闭任务执行器。然而,关闭任务执行器会造成缓存数据丢失,导致不必要的重计算开销,该情况在Spark交互式数据查询应用中尤为常见。为尽量减少任务执行器关闭以提升查询效率,设计实现一种基于预测的Spark动态资源分配策略。该策略基于马尔科夫理论构建Spark交互式数据查询应用的非活跃期持续时间预测模型,并依据预测结果确定任务执行器的关闭时机。试验结果表明,相比既有的Spark动态资源分配策略,采用基于预测的资源分配策略可使Spark交互式数据查询效率平均提升59.34%。
The distributed in-memory computing framework Spark is the latest technological advancement in the field of massive data processing.Under dynamic resource allocation,Spark can dynamically increase and close executors according to the workload of the application.However,removing executors would result in the loss of cached data and lead to unnecessary recomputing cost.This situation is particularly common in Spark interactive data query applications.Therefore,it is necessary to minimize the closing of the executors to improve the query efficiency.This paper designs and implements a prediction-based dynamic resource allocation strategy for Spark platform.This strategy constructs a non-active duration prediction model of Spark interactive data query application based on Markov theory,and determines the closing time of executors according to the prediction result.The experimental results show that compared with Spark’s dynamic resource allocation strategy,the efficiency of Spark’s interactive data query can be improved by59.34%.
作者
梁毅
程石帆
常世禄
刘飞
LIANG Yi;CHENG Shi-fan;CHANG Shi-lu;LIU Fei(Computer Academy,Beijing University of Technology,Beijing 100124,China)
出处
《软件导刊》
2018年第12期43-47,共5页
Software Guide