The proliferation of the global datasphere has forced cloud storage systems to evolve more complex architectures for different applications.The emergence of these application session requests and system daemon service...The proliferation of the global datasphere has forced cloud storage systems to evolve more complex architectures for different applications.The emergence of these application session requests and system daemon services has created large persistent flows with diverse performance requirements that need to coexist with other types of traffic.Current routing methods such as equal-cost multipath(ECMP)and Hedera do not take into consideration specific traffic characteristics nor performance requirements,which make these methods difficult to meet the quality of service(QoS)for high-priority flows.In this paper,we tailored the best routing for different kinds of cloud storage flows as an integer programming problem and utilized grey relational analysis(GRA)to solve this optimization problem.The resulting method is a GRAbased service-aware flow scheduling(GRSA)framework that considers requested flow types and network status to select appropriate routing paths for flows in cloud storage datacenter networks.The results from experiments carried out on a real traffic trace show that the proposed GRSA method can better balance traffic loads,conserve table space and reduce the average transmission delay for high-priority flows compared to ECMP and Hedera.展开更多
Datacenters have played an increasingly essential role as the underlying infrastructure in cloud computing. As implied by the essence of cloud computing, resources in these datacenters are shared by multiple competing...Datacenters have played an increasingly essential role as the underlying infrastructure in cloud computing. As implied by the essence of cloud computing, resources in these datacenters are shared by multiple competing entities, which can be either tenants that rent virtual machines(VMs) in a public cloud such as Amazon EC2, or applications that embrace data parallel frameworks like MapReduce in a private cloud maintained by Google. It has been generally observed that with traditional transport-layer protocols allocating link bandwidth in datacenters, network traffic from competing applications interferes with each other, resulting in a severe lack of predictability and fairness of application performance. Such a critical issue has drawn a substantial amount of recent research attention on bandwidth allocation in datacenter networks, with a number of new mechanisms proposed to efficiently and fairly share a datacenter network among competing entities. In this article, we present an extensive survey of existing bandwidth allocation mechanisms in the literature, covering the scenarios of both public and private clouds. We thoroughly investigate their underlying design principles, evaluate the trade-off involved in their design choices and summarize them in a unified design space, with the hope of conveying some meaningful insights for better designs in the future.展开更多
The layer 2 network technology is extending beyond its traditional local area implementation and finding wider acceptance in provider’s metropolitan area networks and large-scale cloud data center networks. This is m...The layer 2 network technology is extending beyond its traditional local area implementation and finding wider acceptance in provider’s metropolitan area networks and large-scale cloud data center networks. This is mainly due to its plug-and-play capability and native mobility support. Many efforts have been put to increase the bisection bandwidth in a layer 2 network, which has been constrained by the spanning tree protocol that a layer 2 network uses for preventing looping. The recent trend is to incorporate layer 3’s routing approach into a layer 2 network so that multiple paths can be used for forwarding traffic between any source-destination (S-D) node pair. ECMP (equal cost multipath) is one such example. However, ECMP may still be limited in generating multiple paths due to its shortest path (lowest cost) requirement. In this paper, we consider a non-shortest-path routing approach, called EPMP (Equal Preference Multi-Path) that can generate more paths than ECMP. The EPMP is based on the ordered semi-group algebra. In the EPMP routing, paths that differ in traditionally-defined costs, such as hops, bandwidth, etc., can be made equally preferred and thus become candidate paths. We found that, in comparison with ECMP, EPMP routing not only generates more paths, provides higher bisection bandwidth, but also allows bottleneck links in a hierarchical network to be identified when different traffic patterns are applied. EPMP is also versatile in that it can use various ways of path preference calculations to control the number and the length of paths, making it flexible (like policy-based routing) but also objective (like shortest path first routing) in calculating preferred paths.展开更多
Although dense interconnection datacenter networks(DCNs)(e.g.,Fat Tree) provide multiple paths and high bisection bandwidth for each server pair,the widely used single-path Transmission Control Protocol(TCP)and equal-...Although dense interconnection datacenter networks(DCNs)(e.g.,Fat Tree) provide multiple paths and high bisection bandwidth for each server pair,the widely used single-path Transmission Control Protocol(TCP)and equal-cost multipath(ECMP) transport protocols cannot achieve high resource utilization due to poor resource excavation and allocation.In this paper,we present LESSOR,a performance-oriented multipath forwarding scheme to improve DCNs' resource utilization.By adopting an Open Flow-based centralized control mechanism,LESSOR computes near-optimal transmission path and bandwidth provision for each flow according to the global network view while maintaining nearly real-time network view with the performance-oriented flow observing mechanism.Deployments and comprehensive simulations show that LESSOR can efficiently improve the network throughput,which is higher than ECMP by 4.9%–38.3% under different loads.LESSOR also provides 2%–27.7% improvement of throughput compared with Hedera.Besides,LESSOR decreases the average flow completion time significantly.展开更多
Efficient resource utilization requires that emerging datacenter interconnects support both high performance communication and efficient remote resource sharing. These goals require that the network be more tightly co...Efficient resource utilization requires that emerging datacenter interconnects support both high performance communication and efficient remote resource sharing. These goals require that the network be more tightly coupled with the CPU chips. Designing a new interconnection technology thus requires considering not only the interconnection itself, but also the design of the processors that will rely on it. In this paper, we study memory hierarchy implications for the design of high-speed datacenter interconnects particularly as they affect remote memory access -- and we use PCIe as the vehicle for our investigations. To that end, we build three complementary platforms: a PCIe-interconnected prototype server with which we measure and analyze current bottlenecks; a software simulator that lets us model microarchitectural and cache hierarchy changes; and an FPGA prototype system with a streamlined switchless customized protocol Thunder with which we study hardware optimizations outside the processor. We highlight several architectural modifications to better support remote memory access and communication, and quantify their impact and ]imitations.展开更多
Layer 2 network technology is extending beyond its traditional local area implementation and finding wider acceptance in provider's metropolitan area networks and large-scale cloud data center networks. This is mainl...Layer 2 network technology is extending beyond its traditional local area implementation and finding wider acceptance in provider's metropolitan area networks and large-scale cloud data center networks. This is mainly due to its plug-and-play capability and native mobility support. Many efforts have been put to increase the bisection bandwidth in layer 2 network, which has been constrained by the spanning tree protocol (STP) that layer 2 network uses for preventing looping. The recent trend is to incorporate layer 3's routing approach into layer 2 network so that multiple paths can be used for forwarding traffic between any source-destination (S-D) node pair. Equal cost multipath (ECMP) is one such example. However, ECMP may still be limited in generating multiple paths due to its shortest path (lowest cost) requirement. In this paper, we consider a non-shortest-path routing approach, called equal preference multipath (EPMP) based on ordered semi group theory, which can generate more paths than ECMP. In EPMP routing, all the paths with different traditionally-defined costs, such as hops, bandwidth, etc., can be determined equally now and thus they become equal candidate paths. By the comparative tests with ECMP, EPMP routing not only generates more paths, provides 15% higher bisection bandwidth, but also identifies bottleneck links in a hierarchical network when different traffic patterns are applied EPMP is more flexible in controlling the number and length of multipath generation. Simulation results indicate the effectiveness of the proposed algorithm. It is a good reference for non-blocking running of big datacenter networks.展开更多
Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and can...Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and cannot endure a high packet loss rate. However, except for switch buffer overflow, there is another kind of packet loss in the RDMA network, i.e., packet corruption, which has not been discussed in depth. The packet corruption incurs long application tail latency by causing timeout retransmissions. The challenges to solving packet corruption in the RDMA network include: 1) packet corruption is inevitable with any remedial mechanisms and 2) RDMA hardware is not programmable. This paper proposes some designs which can guarantee the expected tail latency of applications with the existence of packet corruption. The key idea is controlling the occurring probabilities of timeout events caused by packet corruption through transforming timeout retransmissions into out-of-order retransmissions. We build a probabilistic model to estimate the occurrence probabilities and real effects of the corruption patterns. We implement these two mechanisms with the help of programmable switches and the zero-byte message RDMA feature. We build an ns-3 simulation and implement optimization mechanisms on our testbed. The simulation and testbed experiments show that the optimizations can decrease the flow completion time by several orders of magnitudes with less than 3% bandwidth cost at different packet corruption rates.展开更多
基金supported by National Natural Science Foundation of China(Nos.61861013,61662018)Science and Technology Major Project of Guangxi(No.AA18118031)+2 种基金Guangxi Natural Science Foundation of China(No.2018 GXNSFAA050028)the Doctoral Research Foundation of Guilin University of Electronic Science and Technology(No.UF19033Y)Director Fund project of Key Laboratory of Cognitive Radio and Information Processing of Ministry of Education(No.CRKL190102)。
文摘The proliferation of the global datasphere has forced cloud storage systems to evolve more complex architectures for different applications.The emergence of these application session requests and system daemon services has created large persistent flows with diverse performance requirements that need to coexist with other types of traffic.Current routing methods such as equal-cost multipath(ECMP)and Hedera do not take into consideration specific traffic characteristics nor performance requirements,which make these methods difficult to meet the quality of service(QoS)for high-priority flows.In this paper,we tailored the best routing for different kinds of cloud storage flows as an integer programming problem and utilized grey relational analysis(GRA)to solve this optimization problem.The resulting method is a GRAbased service-aware flow scheduling(GRSA)framework that considers requested flow types and network status to select appropriate routing paths for flows in cloud storage datacenter networks.The results from experiments carried out on a real traffic trace show that the proposed GRSA method can better balance traffic loads,conserve table space and reduce the average transmission delay for high-priority flows compared to ECMP and Hedera.
基金support in part by the Research Grants Council(RGC)of Hong Kong under Grant No.615613the National Natural Science Foundation of China(NSFC)/RGC of Hong Kong under Grant No.N HKUST610/11+1 种基金the NSFC under Grant No.U1301253the China Cache Int.Corp.under Contract No.CCNT12EG01
文摘Datacenters have played an increasingly essential role as the underlying infrastructure in cloud computing. As implied by the essence of cloud computing, resources in these datacenters are shared by multiple competing entities, which can be either tenants that rent virtual machines(VMs) in a public cloud such as Amazon EC2, or applications that embrace data parallel frameworks like MapReduce in a private cloud maintained by Google. It has been generally observed that with traditional transport-layer protocols allocating link bandwidth in datacenters, network traffic from competing applications interferes with each other, resulting in a severe lack of predictability and fairness of application performance. Such a critical issue has drawn a substantial amount of recent research attention on bandwidth allocation in datacenter networks, with a number of new mechanisms proposed to efficiently and fairly share a datacenter network among competing entities. In this article, we present an extensive survey of existing bandwidth allocation mechanisms in the literature, covering the scenarios of both public and private clouds. We thoroughly investigate their underlying design principles, evaluate the trade-off involved in their design choices and summarize them in a unified design space, with the hope of conveying some meaningful insights for better designs in the future.
文摘The layer 2 network technology is extending beyond its traditional local area implementation and finding wider acceptance in provider’s metropolitan area networks and large-scale cloud data center networks. This is mainly due to its plug-and-play capability and native mobility support. Many efforts have been put to increase the bisection bandwidth in a layer 2 network, which has been constrained by the spanning tree protocol that a layer 2 network uses for preventing looping. The recent trend is to incorporate layer 3’s routing approach into a layer 2 network so that multiple paths can be used for forwarding traffic between any source-destination (S-D) node pair. ECMP (equal cost multipath) is one such example. However, ECMP may still be limited in generating multiple paths due to its shortest path (lowest cost) requirement. In this paper, we consider a non-shortest-path routing approach, called EPMP (Equal Preference Multi-Path) that can generate more paths than ECMP. The EPMP is based on the ordered semi-group algebra. In the EPMP routing, paths that differ in traditionally-defined costs, such as hops, bandwidth, etc., can be made equally preferred and thus become candidate paths. We found that, in comparison with ECMP, EPMP routing not only generates more paths, provides higher bisection bandwidth, but also allows bottleneck links in a hierarchical network to be identified when different traffic patterns are applied. EPMP is also versatile in that it can use various ways of path preference calculations to control the number and the length of paths, making it flexible (like policy-based routing) but also objective (like shortest path first routing) in calculating preferred paths.
基金supported by the National Basic Research Program(973)of China(No.2012CB315806)the National Natural Science Foundation of China(Nos.61103225 and61379149)+1 种基金the Jiangsu Provincial Natural Science Foundation(No.BK20140070)the Jiangsu Future Networks Innovation Institute Prospective Research Project on Future Networks,China(No.BY2013095-1-06)
文摘Although dense interconnection datacenter networks(DCNs)(e.g.,Fat Tree) provide multiple paths and high bisection bandwidth for each server pair,the widely used single-path Transmission Control Protocol(TCP)and equal-cost multipath(ECMP) transport protocols cannot achieve high resource utilization due to poor resource excavation and allocation.In this paper,we present LESSOR,a performance-oriented multipath forwarding scheme to improve DCNs' resource utilization.By adopting an Open Flow-based centralized control mechanism,LESSOR computes near-optimal transmission path and bandwidth provision for each flow according to the global network view while maintaining nearly real-time network view with the performance-oriented flow observing mechanism.Deployments and comprehensive simulations show that LESSOR can efficiently improve the network throughput,which is higher than ECMP by 4.9%–38.3% under different loads.LESSOR also provides 2%–27.7% improvement of throughput compared with Hedera.Besides,LESSOR decreases the average flow completion time significantly.
基金This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010401, and the National Natural Science Foundation of China under Grant Nos. 61100010, 61402438, and 61402439.
文摘Efficient resource utilization requires that emerging datacenter interconnects support both high performance communication and efficient remote resource sharing. These goals require that the network be more tightly coupled with the CPU chips. Designing a new interconnection technology thus requires considering not only the interconnection itself, but also the design of the processors that will rely on it. In this paper, we study memory hierarchy implications for the design of high-speed datacenter interconnects particularly as they affect remote memory access -- and we use PCIe as the vehicle for our investigations. To that end, we build three complementary platforms: a PCIe-interconnected prototype server with which we measure and analyze current bottlenecks; a software simulator that lets us model microarchitectural and cache hierarchy changes; and an FPGA prototype system with a streamlined switchless customized protocol Thunder with which we study hardware optimizations outside the processor. We highlight several architectural modifications to better support remote memory access and communication, and quantify their impact and ]imitations.
基金supported by the National Natural Science Foundation of China(61363047)the Open Research Fund of Guangdong Key Laboratory of Big Data Analysis and Processing(2017007)the Foshan Science and Technology Innovation Project(2016AG100792)
文摘Layer 2 network technology is extending beyond its traditional local area implementation and finding wider acceptance in provider's metropolitan area networks and large-scale cloud data center networks. This is mainly due to its plug-and-play capability and native mobility support. Many efforts have been put to increase the bisection bandwidth in layer 2 network, which has been constrained by the spanning tree protocol (STP) that layer 2 network uses for preventing looping. The recent trend is to incorporate layer 3's routing approach into layer 2 network so that multiple paths can be used for forwarding traffic between any source-destination (S-D) node pair. Equal cost multipath (ECMP) is one such example. However, ECMP may still be limited in generating multiple paths due to its shortest path (lowest cost) requirement. In this paper, we consider a non-shortest-path routing approach, called equal preference multipath (EPMP) based on ordered semi group theory, which can generate more paths than ECMP. In EPMP routing, all the paths with different traditionally-defined costs, such as hops, bandwidth, etc., can be determined equally now and thus they become equal candidate paths. By the comparative tests with ECMP, EPMP routing not only generates more paths, provides 15% higher bisection bandwidth, but also identifies bottleneck links in a hierarchical network when different traffic patterns are applied EPMP is more flexible in controlling the number and length of multipath generation. Simulation results indicate the effectiveness of the proposed algorithm. It is a good reference for non-blocking running of big datacenter networks.
基金This work was supported by the Key-Area Research and Development Program of Guangdong Province of China under Grant No.2020B0101390001the National Natural Science Foundation of China under Grant Nos.61772265 and 62072228the Fundamental Research Funds for the Central Universities of China,the Collaborative Innovation Center of Novel Software Technology and Industrialization of Jiangsu Province of China,and the Jiangsu Innovation and Entrepreneurship(Shuangchuang)Program of China.
文摘Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and cannot endure a high packet loss rate. However, except for switch buffer overflow, there is another kind of packet loss in the RDMA network, i.e., packet corruption, which has not been discussed in depth. The packet corruption incurs long application tail latency by causing timeout retransmissions. The challenges to solving packet corruption in the RDMA network include: 1) packet corruption is inevitable with any remedial mechanisms and 2) RDMA hardware is not programmable. This paper proposes some designs which can guarantee the expected tail latency of applications with the existence of packet corruption. The key idea is controlling the occurring probabilities of timeout events caused by packet corruption through transforming timeout retransmissions into out-of-order retransmissions. We build a probabilistic model to estimate the occurrence probabilities and real effects of the corruption patterns. We implement these two mechanisms with the help of programmable switches and the zero-byte message RDMA feature. We build an ns-3 simulation and implement optimization mechanisms on our testbed. The simulation and testbed experiments show that the optimizations can decrease the flow completion time by several orders of magnitudes with less than 3% bandwidth cost at different packet corruption rates.