摘要
数据中心的高投入和低资源利用率一直是云服务提供商关注的问题.面对这个难题,直接的解决方案是在同等资源上混合部署更多的应用以提高资源使用效率.然而,由于混部应用对共享资源的竞争导致了应用间的性能干扰,从而影响了应用的性能、服务质量(quality of service,QoS)和用户满意度,因此如何保障应用的性能已成为混部场景下的关键问题.着重从应用和集群特征分析(基础)、干扰检测(前提)、单节点资源分配(微观层面策略)和集群作业调度(宏观层面策略)4个方面阐述多应用混部性能保障的相关背景、挑战和关键技术.在不同的混部场景下,由于应用和集群特征等不同,性能保障工作所面临的挑战和问题复杂度也各异,例如单位资源上混合部署的应用数量会直接影响到搜索资源空间的时间开销,应用的运行方式会影响到共享资源的竞争强度.因此,从问题复杂度角度出发,从应用和集群特征、资源干扰维度和混部应用个数3个维度对相关研究工作面临的挑战进行讨论和分析.探讨了面向高密度混部场景应用性能保障方法的发展方向和挑战,认为全栈式的软硬件协同方法是保障高密度混部下应用性能的趋势,该方法有助于全面地提升应用性能的可靠性和数据中心的资源利用率.
The huge cost of investment and low resource utilization in the datacenter has long been a great concern to cloud providers.To address this issue,a straightforward way is co-locating more applications on the same hardware to improve resource efficiency.However,the shared resource contention caused by co-located applications leads to performance interference,affecting the application’s performance,quality of service(QoS)and user satisfaction.Therefore,how to guarantee the performance of the co-located application has been a key issue in the colocation scenario.We introduce the researches of guaranteeing the performance of co-located applications,including the background of co-location,challenges,and key technologies.The related work is summarized from four aspects:application and cluster characterization(basic),interference detection(premise),server-level resource allocation(micro-level policy),and cluster-level job scheduling(macro-level policy).In addition,due to the diverse characteristics of co-located applications and clusters,the research of guaranteeing the performance faces different challenges and problem complexity in the different co-located scenarios.For example,the number of co-located applications deployed on a unit resource will directly affect the time cost of searching resource space,and the running mode of applications will affect the competition intensity of shared resources.Therefore,from the perspective of problem complexity,we discuss and analyze the challenges of research work from three dimensions,cluster and application characteristics,resource interference dimension,and the number of co-located applications.At the end of this paper,we discuss the future research directions and the challenges in the high deployment density scenario.We conclude that the software/hardware co-designed full-stack approach is the trend to guarantee the performance in high deployment density clusters,and this approach can help to provide predictable performance and high resource efficiency in the datacenter.
作者
郭静
胡存琛
包云岗
Guo Jing;Hu Cunchen;Bao Yungang(Research Center for Advanced Computer System,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100190)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2024年第1期43-65,共23页
Journal of Computer Research and Development
基金
广东重点研发计划项目(2020B010164003)。
关键词
混部
性能保障
服务质量
资源共享
资源隔离
干扰检测
资源管理
作业调度
co-location
performance guarantee
quality of service(QoS)
resource sharing
resource isolation
interference detection
resource management
job scheduling