Congestion control(CC)is always an important issue in the field of networking,and the enthusiasm for its research has never diminished in both academia and industry.In current years,due to the rapid development of mac...Congestion control(CC)is always an important issue in the field of networking,and the enthusiasm for its research has never diminished in both academia and industry.In current years,due to the rapid development of machine learning(ML),the combination of reinforcement learning(RL)and CC has a striking effect.However,These complicated schemes lack generalization and are too heavyweight in storage and computing to be directly implemented in mobile devices.In order to address these problems,we propose Plume,a high-performance,lightweight and generalized RL-CC scheme.Plume proposes a lightweight framework to reduce the overheads while preserving the original performance.Besides,Plume innovatively modifies the framework parameters of the reward function during the retraining process,so that the algorithm can be applied to a variety of scenarios.Evaluation results show that Plume can retain almost all the performance of the original model but the size and decision latency can be reduced by more than 50%and 20%,respectively.Moreover,Plume has better performances in some special scenes.展开更多
The varied network performance in the cloud hurts application performance.This increases the tenant’s cost and becomes the key hindrance to cloud adoption.It is because virtual machines(VMs)belonging to one tenant ca...The varied network performance in the cloud hurts application performance.This increases the tenant’s cost and becomes the key hindrance to cloud adoption.It is because virtual machines(VMs)belonging to one tenant can reside in multiple physical servers and communication interference across tenants occasionally occurs when encountering network congestion.In order to prevent such unpredictability,it is critical for cloud providers to offer the guaranteed network performance at tenant level.Such a critical issue has drawn increasing attention in both academia and industry.Many elaborate mechanisms are proposed to provide guaranteed network performance,such as guaranteed bandwidth or bounded message delay across tenants.However,due to the intrinsic complexities and limited capabilities of commodity hardware,the deployment of these mechanisms still faces great challenges in current cloud datacenters.Moreover,with the rapid development of new technologies,there are new opportunities to improve the performance of existing works,but these possibilities are not under full discussion yet.Therefore,in this paper,we survey the latest development of the network performance guarantee approaches and summarize them based on their features.Then,we explore and discuss the possibilities of using emerging technologies as knobs to upgrade the performance or overcome the inherent shortcomings of existing advances.We hope this article will help readers quickly Received:Apr.07,2020 Revised:Oct.23,2020 Editor:Haifeng Zheng understand the causes of the problems and serve as a guide to motivate researchers to develop innovative algorithms and frameworks.展开更多
基金supported by National Natural Science Foundation of China (NSFC) under Grant (No.61872401)National Natural Science Foundation of China (NSFC) under Grant (No.62132022)+1 种基金Fok Ying Tung Education Foundation (No.171059)BUPT Excellent Ph.D.Students Foundation (No. CX2021102)
文摘Congestion control(CC)is always an important issue in the field of networking,and the enthusiasm for its research has never diminished in both academia and industry.In current years,due to the rapid development of machine learning(ML),the combination of reinforcement learning(RL)and CC has a striking effect.However,These complicated schemes lack generalization and are too heavyweight in storage and computing to be directly implemented in mobile devices.In order to address these problems,we propose Plume,a high-performance,lightweight and generalized RL-CC scheme.Plume proposes a lightweight framework to reduce the overheads while preserving the original performance.Besides,Plume innovatively modifies the framework parameters of the reward function during the retraining process,so that the algorithm can be applied to a variety of scenarios.Evaluation results show that Plume can retain almost all the performance of the original model but the size and decision latency can be reduced by more than 50%and 20%,respectively.Moreover,Plume has better performances in some special scenes.
基金This project is partially supported by the National Natural Science Foundation of China(No.61872401)Fok Ying Tung Education Foundation(No.171059).
文摘The varied network performance in the cloud hurts application performance.This increases the tenant’s cost and becomes the key hindrance to cloud adoption.It is because virtual machines(VMs)belonging to one tenant can reside in multiple physical servers and communication interference across tenants occasionally occurs when encountering network congestion.In order to prevent such unpredictability,it is critical for cloud providers to offer the guaranteed network performance at tenant level.Such a critical issue has drawn increasing attention in both academia and industry.Many elaborate mechanisms are proposed to provide guaranteed network performance,such as guaranteed bandwidth or bounded message delay across tenants.However,due to the intrinsic complexities and limited capabilities of commodity hardware,the deployment of these mechanisms still faces great challenges in current cloud datacenters.Moreover,with the rapid development of new technologies,there are new opportunities to improve the performance of existing works,but these possibilities are not under full discussion yet.Therefore,in this paper,we survey the latest development of the network performance guarantee approaches and summarize them based on their features.Then,we explore and discuss the possibilities of using emerging technologies as knobs to upgrade the performance or overcome the inherent shortcomings of existing advances.We hope this article will help readers quickly Received:Apr.07,2020 Revised:Oct.23,2020 Editor:Haifeng Zheng understand the causes of the problems and serve as a guide to motivate researchers to develop innovative algorithms and frameworks.