The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections an...The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.展开更多
Data center networks may comprise tens or hundreds of thousands of nodes,and,naturally,suffer from frequent software and hardware failures as well as link congestions.Packets are routed along the shortest paths with s...Data center networks may comprise tens or hundreds of thousands of nodes,and,naturally,suffer from frequent software and hardware failures as well as link congestions.Packets are routed along the shortest paths with sufficient resources to facilitate efficient network utilization and minimize delays.In such dynamic networks,links frequently fail or get congested,making the recalculation of the shortest paths a computationally intensive problem.Various routing protocols were proposed to overcome this problem by focusing on network utilization rather than speed.Surprisingly,the design of fast shortest-path algorithms for data centers was largely neglected,though they are universal components of routing protocols.Moreover,parallelization techniques were mostly deployed for random network topologies,and not for regular topologies that are often found in data centers.The aim of this paper is to improve scalability and reduce the time required for the shortest-path calculation in data center networks by parallelization on general-purpose hardware.We propose a novel algorithm that parallelizes edge relaxations as a faster and more scalable solution for popular data center topologies.展开更多
In data centers, the transmission control protocol(TCP) incast causes catastrophic goodput degradation to applications with a many-to-one traffic pattern. In this paper, we intend to tame incast at the receiver-side a...In data centers, the transmission control protocol(TCP) incast causes catastrophic goodput degradation to applications with a many-to-one traffic pattern. In this paper, we intend to tame incast at the receiver-side application. Towards this goal, we first develop an analytical model that formulates the incast probability as a function of connection variables and network environment settings. We combine the model with the optimization theory and derive some insights into minimizing the incast probability through tuning connection variables related to applications. Then,enlightened by the analytical results, we propose an adaptive application-layer solution to the TCP incast.The solution equally allocates advertised windows to concurrent connections, and dynamically adapts the number of concurrent connections to the varying conditions. Simulation results show that our solution consistently eludes incast and achieves high goodput in various scenarios including the ones with multiple bottleneck links and background TCP traffic.展开更多
As a critical infrastructure of cloud computing,data center networks(DCNs)directly determine the service performance of data centers,which provide computing services for various applications such as big data processin...As a critical infrastructure of cloud computing,data center networks(DCNs)directly determine the service performance of data centers,which provide computing services for various applications such as big data processing and artificial intelligence.However,current architectures of data center networks suffer from a long routing path and a low fault tolerance between source and destination servers,which is hard to satisfy the requirements of high-performance data center networks.Based on dual-port servers and Clos network structure,this paper proposed a novel architecture RClos to construct high-performance data center networks.Logically,the proposed architecture is constructed by inserting a dual-port server into each pair of adjacent switches in the fabric of switches,where switches are connected in the form of a ring Clos structure.We describe the structural properties of RClos in terms of network scale,bisection bandwidth,and network diameter.RClos architecture inherits characteristics of its embedded Clos network,which can accommodate a large number of servers with a small average path length.The proposed architecture embraces a high fault tolerance,which adapts to the construction of various data center networks.For example,the average path length between servers is 3.44,and the standardized bisection bandwidth is 0.8 in RClos(32,5).The result of numerical experiments shows that RClos enjoys a small average path length and a high network fault tolerance,which is essential in the construction of high-performance data center networks.展开更多
Many "rich - connected" topologies with multiple parallel paths between smwers have been proposed for data center networks recently to provide high bisection bandwidth, but it re mains challenging to fully utilize t...Many "rich - connected" topologies with multiple parallel paths between smwers have been proposed for data center networks recently to provide high bisection bandwidth, but it re mains challenging to fully utilize the high network capacity by appropriate multi- path routing algorithms. As flow-level path splitting may lead to trafl'ic imbalance between paths due to flow- size difference, packet-level path splitting attracts more attention lately, which spreads packets from flows into multiple available paths and significantly improves link utilizations. However, it may cause packet reordering, confusing the TCP congestion control algorithm and lowering the throughput of flows. In this paper, we design a novel packetlevel multi-path routing scheme called SOPA, which leverag- es OpenFlow to perform packet-level path splitting in a round- robin fashion, and hence significantly mitigates the packet reordering problem and improves the network throughput. Moreover, SOPA leverages the topological feature of data center networks to encode a very small number of switches along the path into the packet header, resulting in very light overhead. Compared with random packet spraying (RPS), Hedera and equal-cost multi-path routing (ECMP), our simulations demonstrate that SOPA achieves 29.87%, 50.41% and 77.74% higher network throughput respectively under permutation workload, and reduces average data transfer completion time by 53.65%, 343.31% and 348.25% respectively under production workload.展开更多
With the emerging diverse applications in data centers,the demands on quality of service in data centers also become diverse,such as high throughput of elephant flows and low latency of deadline-sensitive flows.Howeve...With the emerging diverse applications in data centers,the demands on quality of service in data centers also become diverse,such as high throughput of elephant flows and low latency of deadline-sensitive flows.However,traditional TCPs are ill-suited to such situations and always result in the inefficiency(e.g.missing the flow deadline,inevitable throughput collapse)of data transfers.This further degrades the user-perceived quality of service(QoS)in data centers.To reduce the flow completion time of mice and deadline-sensitive flows along with promoting the throughput of elephant flows,an efficient and deadline-aware priority-driven congestion control(PCC)protocol,which grants mice and deadline-sensitive flows the highest priority,is proposed in this paper.Specifically,PCC computes the priority of different flows according to the size of transmitted data,the remaining data volume,and the flows’deadline.Then PCC adjusts the congestion window according to the flow priority and the degree of network congestion.Furthermore,switches in data centers control the input/output of packets based on the flow priority and the queue length.Different from existing TCPs,to speed up the data transfers of mice and deadline-sensitive flows,PCC provides an effective method to compute and encode the flow priority explicitly.According to the flow priority,switches can manage packets efficiently and ensure the data transfers of high priority flows through a weighted priority scheduling with minor modification.The experimental results prove that PCC can improve the data transfer performance of mice and deadline-sensitive flows while guaranting the throughput of elephant flows.展开更多
Network updates have become increasingly prevalent since the broad adoption of software-defined networks(SDNs)in data centers.Modern TCP designs,including cutting-edge TCP variants DCTCP,CUBIC,and BBR,however,are not ...Network updates have become increasingly prevalent since the broad adoption of software-defined networks(SDNs)in data centers.Modern TCP designs,including cutting-edge TCP variants DCTCP,CUBIC,and BBR,however,are not resilient to network updates that provoke flow rerouting.In this paper,we first demonstrate that popular TCP implementations perform inadequately in the presence of frequent and inconsistent network updates,because inconsistent and frequent network updates result in out-of-order packets and packet drops induced via transitory congestion and lead to serious performance deterioration.We look into the causes and propose a network update-friendly TCP(NUFTCP),which is an extension of the DCTCP variant,as a solution.Simulations are used to assess the proposed NUFTCP.Our findings reveal that NUFTCP can more effectively manage the problems of out-of-order packets and packet drops triggered in network updates,and it outperforms DCTCP considerably.展开更多
Cloud Datacenter Network(CDN)providers usually have the option to scale their network structures to allow for far more resource capacities,though such scaling options may come with exponential costs that contradict th...Cloud Datacenter Network(CDN)providers usually have the option to scale their network structures to allow for far more resource capacities,though such scaling options may come with exponential costs that contradict their utility objectives.Yet,besides the cost of the physical assets and network resources,such scaling may also imposemore loads on the electricity power grids to feed the added nodes with the required energy to run and cool,which comes with extra costs too.Thus,those CDNproviders who utilize their resources better can certainly afford their services at lower price-units when compared to others who simply choose the scaling solutions.Resource utilization is a quite challenging process;indeed,clients of CDNs usually tend to exaggerate their true resource requirements when they lease their resources.Service providers are committed to their clients with Service Level Agreements(SLAs).Therefore,any amendment to the resource allocations needs to be approved by the clients first.In this work,we propose deploying a Stackelberg leadership framework to formulate a negotiation game between the cloud service providers and their client tenants.Through this,the providers seek to retrieve those leased unused resources from their clients.Cooperation is not expected from the clients,and they may ask high price units to return their extra resources to the provider’s premises.Hence,to motivate cooperation in such a non-cooperative game,as an extension to theVickery auctions,we developed an incentive-compatible pricingmodel for the returned resources.Moreover,we also proposed building a behavior belief function that shapes the way of negotiation and compensation for each client.Compared to other benchmark models,the assessment results showthat our proposed models provide for timely negotiation schemes,allowing for better resource utilization rates,higher utilities,and grid-friend CDNs.展开更多
In the rising tide of the Internet of things, more and more things in the world are connected to the Internet. Recently, data have kept growing at a rate more than four times of that expected in Moore's law. This exp...In the rising tide of the Internet of things, more and more things in the world are connected to the Internet. Recently, data have kept growing at a rate more than four times of that expected in Moore's law. This explosion of data comes from various sources such as mobile phones, video cameras and sensor networks, which often present multidi- mensional characteristics. The huge amount of data brings many challenges on the management, transportation, and pro- cessing IT infrastructures. To address these challenges, the state-of-art large scale data center networks have begun to provide cloud services that are increasingly prevalent. How- ever, how to build a good data center remains an open chal- lenge. Concurrently, the architecture design, which signifi- cantly affects the total performance, is of great research inter- est. This paper surveys advances in data center network de- sign. In this paper we first introduce the upcoming trends in the data center industry. Then we review some popular design principles for today's data center network architectures. In the third part, we present some up-to-date data center frame- works and make a comprehensive comparison of them. Dur- ing the comparison, we observe that there is no so-called op- timal data center and the design should be different referring to the data placement, replication, processing, and query pro- cessing. After that, several existing challenges and limitations are discussed. According to these observations, we point out some possible future research directions.展开更多
Data Center Networks (DCNs) are the fundamental infrastructure for cloud computing. Driven by the massive parallel computing tasks in cloud computing, one-to-many data dissemination becomes one of the most important...Data Center Networks (DCNs) are the fundamental infrastructure for cloud computing. Driven by the massive parallel computing tasks in cloud computing, one-to-many data dissemination becomes one of the most important traffic patterns in DCNs. Many architectures and protocols are proposed to meet this demand. However, these proposals either require complicated configurations on switches and servers, or cannot deliver an optimal performance. In this paper, we propose the peer-assisted data dissemination for DCNs. This approach utilizes the rich physical connections with high bandwidths and mutli-path connections, to facilitate efficient one-to-many data dissemination. We prove that an optimal P2P data dissemination schedule exists for FatTree, a specially- designed DCN architecture. We then present a theoretical analysis of this algorithm in the general multi-rooted tree topology, a widely-used DCN architecture. Additionally, we explore the performance of an intuitive line structure for data dissemination. Our analysis and experimental results prove that this simple structure is able to produce a comparable performance to the optimal algorithm. Since DCN applications heavily rely on virtualization to achieve optimal resource sharing, we present a general implementation method for the proposed algorithms, which aims to mitigate the impact of the potentially-high churn rate of the virtual machines.展开更多
In modern data centers, power consumed by network is an observable portion of the total energy budget and thus improving the energy efficiency of data center networks (DCNs) truly matters. One effective way for this...In modern data centers, power consumed by network is an observable portion of the total energy budget and thus improving the energy efficiency of data center networks (DCNs) truly matters. One effective way for this energy efficiency is to make the size of DCNs elastic along with traffic demands by flow consolidation and bandwidth scheduling, i.e., turning off unnecessary network components to reduce the power consumption. Meanwhile, having the instinct support for data center management, software defined networking (SDN) provides a paradigm to elastically control the resources of DCNs. To achieve such power savings, most of the prior efforts just adopt simple greedy heuristic to reduce computational complexity. However, due to the inherent problem of greedy algorithm, a good-enough optimization cannot be always guaranteed. To address this problem, a modified hybrid genetic algorithm (MHGA) is employed to improve the solution's accuracy, and the fine-grained routing function of SDN is fully leveraged. The simulation results show that more efficient power management can be achieved than the previous studies, by increasing about 5% of network energy savings.展开更多
Cloud data centers now provide a plethora of rich online applications such as web search, social networking, and cloud computing. A key challenge for such applications, however, is to meet soft real-time constraints. ...Cloud data centers now provide a plethora of rich online applications such as web search, social networking, and cloud computing. A key challenge for such applications, however, is to meet soft real-time constraints. Due to the deadline-agnostic congestion control in Transmission Control Protocol(TCP), many deadline-sensitive flows cannot finish transmission before their deadlines. In this paper, we propose an SDNbased Explicit-Deadline-aware TCP(SED) for cloud Data Center Networks(DCN). SED assigns a base rate for non-deadline flows first and gives spare bandwidth to the deadline flows as much as possible. Subsequently,a Retransmission-enhanced SED(RSED) is introduced to solve the packet-loss timeout problem. Through our experiments, we show that SED can make flows meet deadlines effectively, and that it significantly outperforms previous protocols in the cloud data center environment.展开更多
Ethernet link aggregation, which provides an easy and cost-effective way to increase both bandwidth and link availability between a pair of devices, is well suited for data center networks. However, all the traffic sp...Ethernet link aggregation, which provides an easy and cost-effective way to increase both bandwidth and link availability between a pair of devices, is well suited for data center networks. However, all the traffic splitting algorithms used in existing Ethernet link aggregation are flow-level which do not work well owing to the traffic characteristics of data centers. Though frame-level traffic splitting can achieve optimal load balance and the maximum benefits from aggregated capacity, it is generally deprecated in most cases because of frame disordering which can disrupt the operation of many Internet protocols, most notably transmission control protocol (TCP). To address this issue, we first investigate the causes of frame disordering in link aggregation and find that all of them either are no longer true or can be prevented in data centers. Then we present a byte-counter frame-level traffic splitting algorithm which achieves optimal performance while causes no frame disordering. The only requirement is that frames in a flow are the same size which can be easily met in data centers. Simulation results show that the proposed frame-level traffic splitting method could achieve higher throughput and optimal load balance. The average completion time of different sized flows is reduced by 24% on average and by up to 46%.展开更多
Currently, the elastic interconnection has realized the high-rate data transmission among data centers(DCs). Thus, the elastic data center network(EDCN) emerged. In EDCNs, it is essential to achieve the virtual networ...Currently, the elastic interconnection has realized the high-rate data transmission among data centers(DCs). Thus, the elastic data center network(EDCN) emerged. In EDCNs, it is essential to achieve the virtual network(VN) embedding, which includes two main components: VM(virtual machine) mapping and VL(virtual link) mapping. In VM mapping, we allocate appropriate servers to hold VMs. While for VL mapping,an optimal substrate path is determined for each virtual lightpath. For the VN embedding in EDCNs, the power efficiency is a significant concern, and some solutions were proposed through sleeping light-duty servers.However, the increasing communication traffic between VMs leads to a serious energy dissipation problem, since it also consumes a great amount of energy on switches even utilizing the energy-efficient optical transmission technique. In this paper, considering load balancing and power-efficient VN embedding, we formulate the problem and design a novel heuristic for EDCNs, with the objective to achieve the power savings of servers and switches. In our solution, VMs are mapped into a single DC or multiple DCs with the short distance between each other, and the servers in the same cluster or adjacent clusters are preferred to hold VMs. Such that, a large amount of servers and switches will become vacant and can go into sleep mode. Simulation results demonstrate that our method performs well in terms of power savings and load balancing. Compared with benchmarks, the improvement ratio of power efficiency is 5%–13%.展开更多
To support the needs of ever-growing cloudbased services,the number of servers and network devices in data centers is increasing exponentially,which in turn results in high complexities and difficulties in network opt...To support the needs of ever-growing cloudbased services,the number of servers and network devices in data centers is increasing exponentially,which in turn results in high complexities and difficulties in network optimization.Machine learning(ML)provides an effective way to deal with these challenges by enabling network intelligence.To this end,numerous creative ML-based approaches have been put forward in recent years.Nevertheless,the intelligent optimization of data center networks(DCN)still faces enormous challenges.To the best of our knowledge,there is a lack of systematic and original investigations with in-depth analysis on intelligent DCN.To this end,in this paper,we investigate the application of ML to DCN optimization and provide a general overview and in-depth analysis of the recent works,covering flow prediction,flow classification,and resource management.Moreover,we also give unique insights into the technology evolution of the fusion of DCN and ML,together with some challenges and future research opportunities.展开更多
1 Introduction The history of data centers can be traced back to the 1960s. Early data centers were deployed on main- frames that were time-shared by users via remote terminals. The boom in data centers came duringthe...1 Introduction The history of data centers can be traced back to the 1960s. Early data centers were deployed on main- frames that were time-shared by users via remote terminals. The boom in data centers came duringthe internet era. Many companies started building large inter- net-connected facililies,展开更多
The primary focus of this paper is to design a progressive restoration plan for an enterprise data center environment following a partial or full disruption. Repairing and restoring disrupted components in an enterpri...The primary focus of this paper is to design a progressive restoration plan for an enterprise data center environment following a partial or full disruption. Repairing and restoring disrupted components in an enterprise data center requires a significant amount of time and human effort. Following a major disruption, the recovery process involves multiple stages, and during each stage, the partially recovered infrastructures can provide limited services to users at some degraded service level. However, how fast and efficiently an enterprise infrastructure can be recovered de- pends on how the recovery mechanism restores the disrupted components, considering the inter-dependencies between services, along with the limitations of expert human operators. The entire problem turns out to be NP- hard and rather complex, and we devise an efficient meta-heuristic to solve the problem. By considering some real-world examples, we show that the proposed meta-heuristic provides very accurate results, and still runs 600-2800 times faster than the optimal solution obtained from a general purpose mathematical solver [1].展开更多
According to the high operating costs and a large number of energy waste in the current data center network architectures, we propose a kind of trusted flow preemption scheduling combining the energy-saving routing me...According to the high operating costs and a large number of energy waste in the current data center network architectures, we propose a kind of trusted flow preemption scheduling combining the energy-saving routing mechanism based on typical data center network architecture. The mechanism can make the network flow in its exclusive network link bandwidth and transmission path, which can improve the link utilization and the use of the network energy efficiency. Meanwhile, we apply trusted computing to guarantee the high security, high performance and high fault-tolerant routing forwarding service, which helps improving the average completion time of network flow.展开更多
According to Cisco’s Internet Report 2020 white paper,there will be 29.3 billion connected devices worldwide by 2023,up from 18.4 billion in 2018.5G connections will generate nearly three times more traffic than 4G c...According to Cisco’s Internet Report 2020 white paper,there will be 29.3 billion connected devices worldwide by 2023,up from 18.4 billion in 2018.5G connections will generate nearly three times more traffic than 4G connections.While bringing a boom to the network,it also presents unprecedented challenges in terms of flow forwarding decisions.The path assignment mechanism used in traditional traffic schedulingmethods tends to cause local network congestion caused by the concentration of elephant flows,resulting in unbalanced network load and degraded quality of service.Using the centralized control of software-defined networks,this study proposes a data center traffic scheduling strategy for minimization congestion and quality of service guaranteeing(MCQG).The ideal transmission path is selected for data flows while considering the network congestion rate and quality of service.Different traffic scheduling strategies are used according to the characteristics of different service types in data centers.Reroute scheduling for elephant flows that tend to cause local congestion.The path evaluation function is formed by the maximum link utilization on the path,the number of elephant flows and the time delay,and the fast merit-seeking capability of the sparrow search algorithm is used to find the path with the lowest actual link overhead as the rerouting path for the elephant flows.It is used to reduce the possibility of local network congestion occurrence.Equal cost multi-path(ECMP)protocols with faster response time are used to schedulemouse flows with shorter duration.Used to guarantee the quality of service of the network.To achieve isolated transmission of various types of data streams.The experimental results show that the proposed strategy has higher throughput,better network load balancing,and better robustness compared to ECMP under different traffic models.In addition,because it can fully utilize the resources in the network,MCQG also outperforms another traffic scheduling strategy that does rerouting for elephant flows(namely Hedera).Compared withECMPandHedera,MCQGimproves average throughput by 11.73%and 4.29%,and normalized total throughput by 6.74%and 2.64%,respectively;MCQG improves link utilization by 23.25%and 15.07%;in addition,the average round-trip delay and packet loss rate fluctuate significantly less than the two compared strategies.展开更多
In a data center network (DCN), load balancing is required when servers transfer data on the same path. This is necessary to avoid congestion. Load balancing is challenged by the dynamic transferral of demands and c...In a data center network (DCN), load balancing is required when servers transfer data on the same path. This is necessary to avoid congestion. Load balancing is challenged by the dynamic transferral of demands and complex routing control. Because of the distributed nature of a traditional network, previous research on load balancing has mostly focused on improving the performance of the local network; thus, the load has not been optimally balanced across the entire network. In this paper, we propose a novel dynamic load-balancing algorithm for fat-tree. This algorithm avoids congestions to the great possible extent by searching for non-conflicting paths in a centralized way. We implement the algorithm in the popular software-defined networking architecture and evaluate the algorithm' s performance on the Mininet platform. The results show that our algorithm has higher bisection band- width than the traditional equal-cost multi-path load-balancing algorithm and thus more effectively avoids congestion.展开更多
文摘The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.
基金This work was supported by the Serbian Ministry of Science and Education(project TR-32022)by companies Telekom Srbija and Informatika.
文摘Data center networks may comprise tens or hundreds of thousands of nodes,and,naturally,suffer from frequent software and hardware failures as well as link congestions.Packets are routed along the shortest paths with sufficient resources to facilitate efficient network utilization and minimize delays.In such dynamic networks,links frequently fail or get congested,making the recalculation of the shortest paths a computationally intensive problem.Various routing protocols were proposed to overcome this problem by focusing on network utilization rather than speed.Surprisingly,the design of fast shortest-path algorithms for data centers was largely neglected,though they are universal components of routing protocols.Moreover,parallelization techniques were mostly deployed for random network topologies,and not for regular topologies that are often found in data centers.The aim of this paper is to improve scalability and reduce the time required for the shortest-path calculation in data center networks by parallelization on general-purpose hardware.We propose a novel algorithm that parallelizes edge relaxations as a faster and more scalable solution for popular data center topologies.
基金supported by the Fundamental Research Fundsfor the Central Universities under Grant No.ZYGX2015J009the Sichuan Province Scientific and Technological Support Project under Grants No.2014GZ0017 and No.2016GZ0093
文摘In data centers, the transmission control protocol(TCP) incast causes catastrophic goodput degradation to applications with a many-to-one traffic pattern. In this paper, we intend to tame incast at the receiver-side application. Towards this goal, we first develop an analytical model that formulates the incast probability as a function of connection variables and network environment settings. We combine the model with the optimization theory and derive some insights into minimizing the incast probability through tuning connection variables related to applications. Then,enlightened by the analytical results, we propose an adaptive application-layer solution to the TCP incast.The solution equally allocates advertised windows to concurrent connections, and dynamically adapts the number of concurrent connections to the varying conditions. Simulation results show that our solution consistently eludes incast and achieves high goodput in various scenarios including the ones with multiple bottleneck links and background TCP traffic.
基金This work was supported by the Hainan Provincial Natural Science Foundation of China(620RC560,2019RC096,620RC562)the Scientific Research Setup Fund of Hainan University(KYQD(ZR)1877)+2 种基金the National Natural Science Foundation of China(62162021,82160345,61802092)the key research and development program of Hainan province(ZDYF2020199,ZDYF2021GXJS017)the key science and technology plan project of Haikou(2011-016).
文摘As a critical infrastructure of cloud computing,data center networks(DCNs)directly determine the service performance of data centers,which provide computing services for various applications such as big data processing and artificial intelligence.However,current architectures of data center networks suffer from a long routing path and a low fault tolerance between source and destination servers,which is hard to satisfy the requirements of high-performance data center networks.Based on dual-port servers and Clos network structure,this paper proposed a novel architecture RClos to construct high-performance data center networks.Logically,the proposed architecture is constructed by inserting a dual-port server into each pair of adjacent switches in the fabric of switches,where switches are connected in the form of a ring Clos structure.We describe the structural properties of RClos in terms of network scale,bisection bandwidth,and network diameter.RClos architecture inherits characteristics of its embedded Clos network,which can accommodate a large number of servers with a small average path length.The proposed architecture embraces a high fault tolerance,which adapts to the construction of various data center networks.For example,the average path length between servers is 3.44,and the standardized bisection bandwidth is 0.8 in RClos(32,5).The result of numerical experiments shows that RClos enjoys a small average path length and a high network fault tolerance,which is essential in the construction of high-performance data center networks.
基金supported by the National Basic Research Program of China(973 program)under Grant No.2014CB347800 and No.2012CB315803the National High-Tech R&D Program of China(863 program)under Grant No.2013AA013303+1 种基金the Natural Science Foundation of China under Grant No.61170291,No.61133006,and No.61161140454ZTE IndustryAcademia-Research Cooperation Funds
文摘Many "rich - connected" topologies with multiple parallel paths between smwers have been proposed for data center networks recently to provide high bisection bandwidth, but it re mains challenging to fully utilize the high network capacity by appropriate multi- path routing algorithms. As flow-level path splitting may lead to trafl'ic imbalance between paths due to flow- size difference, packet-level path splitting attracts more attention lately, which spreads packets from flows into multiple available paths and significantly improves link utilizations. However, it may cause packet reordering, confusing the TCP congestion control algorithm and lowering the throughput of flows. In this paper, we design a novel packetlevel multi-path routing scheme called SOPA, which leverag- es OpenFlow to perform packet-level path splitting in a round- robin fashion, and hence significantly mitigates the packet reordering problem and improves the network throughput. Moreover, SOPA leverages the topological feature of data center networks to encode a very small number of switches along the path into the packet header, resulting in very light overhead. Compared with random packet spraying (RPS), Hedera and equal-cost multi-path routing (ECMP), our simulations demonstrate that SOPA achieves 29.87%, 50.41% and 77.74% higher network throughput respectively under permutation workload, and reduces average data transfer completion time by 53.65%, 343.31% and 348.25% respectively under production workload.
基金supported part by the National Natural Science Foundation of China(61601252,61801254)Public Technology Projects of Zhejiang Province(LG-G18F020007)+1 种基金Zhejiang Provincial Natural Science Foundation of China(LY20F020008,LY18F020011,LY20F010004)K.C.Wong Magna Fund in Ningbo University。
文摘With the emerging diverse applications in data centers,the demands on quality of service in data centers also become diverse,such as high throughput of elephant flows and low latency of deadline-sensitive flows.However,traditional TCPs are ill-suited to such situations and always result in the inefficiency(e.g.missing the flow deadline,inevitable throughput collapse)of data transfers.This further degrades the user-perceived quality of service(QoS)in data centers.To reduce the flow completion time of mice and deadline-sensitive flows along with promoting the throughput of elephant flows,an efficient and deadline-aware priority-driven congestion control(PCC)protocol,which grants mice and deadline-sensitive flows the highest priority,is proposed in this paper.Specifically,PCC computes the priority of different flows according to the size of transmitted data,the remaining data volume,and the flows’deadline.Then PCC adjusts the congestion window according to the flow priority and the degree of network congestion.Furthermore,switches in data centers control the input/output of packets based on the flow priority and the queue length.Different from existing TCPs,to speed up the data transfers of mice and deadline-sensitive flows,PCC provides an effective method to compute and encode the flow priority explicitly.According to the flow priority,switches can manage packets efficiently and ensure the data transfers of high priority flows through a weighted priority scheduling with minor modification.The experimental results prove that PCC can improve the data transfer performance of mice and deadline-sensitive flows while guaranting the throughput of elephant flows.
基金supportted by the King Khalid University through the Large Group Project(No.RGP.2/312/44).
文摘Network updates have become increasingly prevalent since the broad adoption of software-defined networks(SDNs)in data centers.Modern TCP designs,including cutting-edge TCP variants DCTCP,CUBIC,and BBR,however,are not resilient to network updates that provoke flow rerouting.In this paper,we first demonstrate that popular TCP implementations perform inadequately in the presence of frequent and inconsistent network updates,because inconsistent and frequent network updates result in out-of-order packets and packet drops induced via transitory congestion and lead to serious performance deterioration.We look into the causes and propose a network update-friendly TCP(NUFTCP),which is an extension of the DCTCP variant,as a solution.Simulations are used to assess the proposed NUFTCP.Our findings reveal that NUFTCP can more effectively manage the problems of out-of-order packets and packet drops triggered in network updates,and it outperforms DCTCP considerably.
基金The Deanship of Scientific Research at Hashemite University partially funds this workDeanship of Scientific Research at the Northern Border University,Arar,KSA for funding this research work through the project number“NBU-FFR-2024-1580-08”.
文摘Cloud Datacenter Network(CDN)providers usually have the option to scale their network structures to allow for far more resource capacities,though such scaling options may come with exponential costs that contradict their utility objectives.Yet,besides the cost of the physical assets and network resources,such scaling may also imposemore loads on the electricity power grids to feed the added nodes with the required energy to run and cool,which comes with extra costs too.Thus,those CDNproviders who utilize their resources better can certainly afford their services at lower price-units when compared to others who simply choose the scaling solutions.Resource utilization is a quite challenging process;indeed,clients of CDNs usually tend to exaggerate their true resource requirements when they lease their resources.Service providers are committed to their clients with Service Level Agreements(SLAs).Therefore,any amendment to the resource allocations needs to be approved by the clients first.In this work,we propose deploying a Stackelberg leadership framework to formulate a negotiation game between the cloud service providers and their client tenants.Through this,the providers seek to retrieve those leased unused resources from their clients.Cooperation is not expected from the clients,and they may ask high price units to return their extra resources to the provider’s premises.Hence,to motivate cooperation in such a non-cooperative game,as an extension to theVickery auctions,we developed an incentive-compatible pricingmodel for the returned resources.Moreover,we also proposed building a behavior belief function that shapes the way of negotiation and compensation for each client.Compared to other benchmark models,the assessment results showthat our proposed models provide for timely negotiation schemes,allowing for better resource utilization rates,higher utilities,and grid-friend CDNs.
文摘In the rising tide of the Internet of things, more and more things in the world are connected to the Internet. Recently, data have kept growing at a rate more than four times of that expected in Moore's law. This explosion of data comes from various sources such as mobile phones, video cameras and sensor networks, which often present multidi- mensional characteristics. The huge amount of data brings many challenges on the management, transportation, and pro- cessing IT infrastructures. To address these challenges, the state-of-art large scale data center networks have begun to provide cloud services that are increasingly prevalent. How- ever, how to build a good data center remains an open chal- lenge. Concurrently, the architecture design, which signifi- cantly affects the total performance, is of great research inter- est. This paper surveys advances in data center network de- sign. In this paper we first introduce the upcoming trends in the data center industry. Then we review some popular design principles for today's data center network architectures. In the third part, we present some up-to-date data center frame- works and make a comprehensive comparison of them. Dur- ing the comparison, we observe that there is no so-called op- timal data center and the design should be different referring to the data placement, replication, processing, and query pro- cessing. After that, several existing challenges and limitations are discussed. According to these observations, we point out some possible future research directions.
基金supported in part by the Natural Science Foundation of USA(Nos.ECCS 1128209,CNS 10655444,CCF 1028167,CNS 0948184,and CCF 0830289)
文摘Data Center Networks (DCNs) are the fundamental infrastructure for cloud computing. Driven by the massive parallel computing tasks in cloud computing, one-to-many data dissemination becomes one of the most important traffic patterns in DCNs. Many architectures and protocols are proposed to meet this demand. However, these proposals either require complicated configurations on switches and servers, or cannot deliver an optimal performance. In this paper, we propose the peer-assisted data dissemination for DCNs. This approach utilizes the rich physical connections with high bandwidths and mutli-path connections, to facilitate efficient one-to-many data dissemination. We prove that an optimal P2P data dissemination schedule exists for FatTree, a specially- designed DCN architecture. We then present a theoretical analysis of this algorithm in the general multi-rooted tree topology, a widely-used DCN architecture. Additionally, we explore the performance of an intuitive line structure for data dissemination. Our analysis and experimental results prove that this simple structure is able to produce a comparable performance to the optimal algorithm. Since DCN applications heavily rely on virtualization to achieve optimal resource sharing, we present a general implementation method for the proposed algorithms, which aims to mitigate the impact of the potentially-high churn rate of the virtual machines.
基金supported by the Research Fund of Ministry of Education-China Mobile (MCM20160304)
文摘In modern data centers, power consumed by network is an observable portion of the total energy budget and thus improving the energy efficiency of data center networks (DCNs) truly matters. One effective way for this energy efficiency is to make the size of DCNs elastic along with traffic demands by flow consolidation and bandwidth scheduling, i.e., turning off unnecessary network components to reduce the power consumption. Meanwhile, having the instinct support for data center management, software defined networking (SDN) provides a paradigm to elastically control the resources of DCNs. To achieve such power savings, most of the prior efforts just adopt simple greedy heuristic to reduce computational complexity. However, due to the inherent problem of greedy algorithm, a good-enough optimization cannot be always guaranteed. To address this problem, a modified hybrid genetic algorithm (MHGA) is employed to improve the solution's accuracy, and the fine-grained routing function of SDN is fully leveraged. The simulation results show that more efficient power management can be achieved than the previous studies, by increasing about 5% of network energy savings.
基金supported by the National Natural Science Foundation of China (Nos. 61370209 and 61402230)
文摘Cloud data centers now provide a plethora of rich online applications such as web search, social networking, and cloud computing. A key challenge for such applications, however, is to meet soft real-time constraints. Due to the deadline-agnostic congestion control in Transmission Control Protocol(TCP), many deadline-sensitive flows cannot finish transmission before their deadlines. In this paper, we propose an SDNbased Explicit-Deadline-aware TCP(SED) for cloud Data Center Networks(DCN). SED assigns a base rate for non-deadline flows first and gives spare bandwidth to the deadline flows as much as possible. Subsequently,a Retransmission-enhanced SED(RSED) is introduced to solve the packet-loss timeout problem. Through our experiments, we show that SED can make flows meet deadlines effectively, and that it significantly outperforms previous protocols in the cloud data center environment.
基金supported by the National Natural Science Foundation of China(61002011)the Open Fund of the State Key Laboratory of Software Development Environment(SKLSDE-2009KF-2-08)+1 种基金the National Basic Research Program of China(2009CB320505)the Hi-Tech Research and Development Program of China(2011AA01A102)
文摘Ethernet link aggregation, which provides an easy and cost-effective way to increase both bandwidth and link availability between a pair of devices, is well suited for data center networks. However, all the traffic splitting algorithms used in existing Ethernet link aggregation are flow-level which do not work well owing to the traffic characteristics of data centers. Though frame-level traffic splitting can achieve optimal load balance and the maximum benefits from aggregated capacity, it is generally deprecated in most cases because of frame disordering which can disrupt the operation of many Internet protocols, most notably transmission control protocol (TCP). To address this issue, we first investigate the causes of frame disordering in link aggregation and find that all of them either are no longer true or can be prevented in data centers. Then we present a byte-counter frame-level traffic splitting algorithm which achieves optimal performance while causes no frame disordering. The only requirement is that frames in a flow are the same size which can be easily met in data centers. Simulation results show that the proposed frame-level traffic splitting method could achieve higher throughput and optimal load balance. The average completion time of different sized flows is reduced by 24% on average and by up to 46%.
基金supported in part by Open Foundation of State Key Laboratory of Information Photonics and Optical Communications (Grant No. IPOC2014B009)Fundamental Research Funds for the Central Universities (Grant Nos. N130817002, N140405005, N150401002)+3 种基金Foundation of the Education Department of Liaoning Province (Grant No. L2014089)National Natural Science Foundation of China (Grant Nos. 61302070, 61401082, 61471109, 61502075)Liaoning Bai Qian Wan Talents ProgramNational High-Level Personnel Special Support Program for Youth Top-Notch Talent
文摘Currently, the elastic interconnection has realized the high-rate data transmission among data centers(DCs). Thus, the elastic data center network(EDCN) emerged. In EDCNs, it is essential to achieve the virtual network(VN) embedding, which includes two main components: VM(virtual machine) mapping and VL(virtual link) mapping. In VM mapping, we allocate appropriate servers to hold VMs. While for VL mapping,an optimal substrate path is determined for each virtual lightpath. For the VN embedding in EDCNs, the power efficiency is a significant concern, and some solutions were proposed through sleeping light-duty servers.However, the increasing communication traffic between VMs leads to a serious energy dissipation problem, since it also consumes a great amount of energy on switches even utilizing the energy-efficient optical transmission technique. In this paper, considering load balancing and power-efficient VN embedding, we formulate the problem and design a novel heuristic for EDCNs, with the objective to achieve the power savings of servers and switches. In our solution, VMs are mapped into a single DC or multiple DCs with the short distance between each other, and the servers in the same cluster or adjacent clusters are preferred to hold VMs. Such that, a large amount of servers and switches will become vacant and can go into sleep mode. Simulation results demonstrate that our method performs well in terms of power savings and load balancing. Compared with benchmarks, the improvement ratio of power efficiency is 5%–13%.
基金National Key Re-search and Development Program of China(2018YFB2101300)National Natural Science Foundation of China(61872147)+1 种基金Dean’s Fund of Engineering Research Center of Soft-ware/Hardware Co-design Technology and ApplicationMinistry of Edu-cation(East China Normal University)。
文摘To support the needs of ever-growing cloudbased services,the number of servers and network devices in data centers is increasing exponentially,which in turn results in high complexities and difficulties in network optimization.Machine learning(ML)provides an effective way to deal with these challenges by enabling network intelligence.To this end,numerous creative ML-based approaches have been put forward in recent years.Nevertheless,the intelligent optimization of data center networks(DCN)still faces enormous challenges.To the best of our knowledge,there is a lack of systematic and original investigations with in-depth analysis on intelligent DCN.To this end,in this paper,we investigate the application of ML to DCN optimization and provide a general overview and in-depth analysis of the recent works,covering flow prediction,flow classification,and resource management.Moreover,we also give unique insights into the technology evolution of the fusion of DCN and ML,together with some challenges and future research opportunities.
基金supported by the ZTE-BJTU Collaborative Research Program under Grant No. K11L00190the Fundamental Research Funds for the Central Universities under Grant No. K12JB00060
文摘1 Introduction The history of data centers can be traced back to the 1960s. Early data centers were deployed on main- frames that were time-shared by users via remote terminals. The boom in data centers came duringthe internet era. Many companies started building large inter- net-connected facililies,
文摘The primary focus of this paper is to design a progressive restoration plan for an enterprise data center environment following a partial or full disruption. Repairing and restoring disrupted components in an enterprise data center requires a significant amount of time and human effort. Following a major disruption, the recovery process involves multiple stages, and during each stage, the partially recovered infrastructures can provide limited services to users at some degraded service level. However, how fast and efficiently an enterprise infrastructure can be recovered de- pends on how the recovery mechanism restores the disrupted components, considering the inter-dependencies between services, along with the limitations of expert human operators. The entire problem turns out to be NP- hard and rather complex, and we devise an efficient meta-heuristic to solve the problem. By considering some real-world examples, we show that the proposed meta-heuristic provides very accurate results, and still runs 600-2800 times faster than the optimal solution obtained from a general purpose mathematical solver [1].
基金supported by the National Natural Science Foundation of China(The key trusted running technologies for the sensing nodes in Internet of things: 61501007The outstanding personnel training program of Beijing municipal Party Committee Organization Department (The Research of Trusted Computing environment for Internet of things in Smart City: 2014000020124G041
文摘According to the high operating costs and a large number of energy waste in the current data center network architectures, we propose a kind of trusted flow preemption scheduling combining the energy-saving routing mechanism based on typical data center network architecture. The mechanism can make the network flow in its exclusive network link bandwidth and transmission path, which can improve the link utilization and the use of the network energy efficiency. Meanwhile, we apply trusted computing to guarantee the high security, high performance and high fault-tolerant routing forwarding service, which helps improving the average completion time of network flow.
基金This work is funded by the National Natural Science Foundation of China under Grant No.61772180the Key R&D plan of Hubei Province(2020BHB004,2020BAB012).
文摘According to Cisco’s Internet Report 2020 white paper,there will be 29.3 billion connected devices worldwide by 2023,up from 18.4 billion in 2018.5G connections will generate nearly three times more traffic than 4G connections.While bringing a boom to the network,it also presents unprecedented challenges in terms of flow forwarding decisions.The path assignment mechanism used in traditional traffic schedulingmethods tends to cause local network congestion caused by the concentration of elephant flows,resulting in unbalanced network load and degraded quality of service.Using the centralized control of software-defined networks,this study proposes a data center traffic scheduling strategy for minimization congestion and quality of service guaranteeing(MCQG).The ideal transmission path is selected for data flows while considering the network congestion rate and quality of service.Different traffic scheduling strategies are used according to the characteristics of different service types in data centers.Reroute scheduling for elephant flows that tend to cause local congestion.The path evaluation function is formed by the maximum link utilization on the path,the number of elephant flows and the time delay,and the fast merit-seeking capability of the sparrow search algorithm is used to find the path with the lowest actual link overhead as the rerouting path for the elephant flows.It is used to reduce the possibility of local network congestion occurrence.Equal cost multi-path(ECMP)protocols with faster response time are used to schedulemouse flows with shorter duration.Used to guarantee the quality of service of the network.To achieve isolated transmission of various types of data streams.The experimental results show that the proposed strategy has higher throughput,better network load balancing,and better robustness compared to ECMP under different traffic models.In addition,because it can fully utilize the resources in the network,MCQG also outperforms another traffic scheduling strategy that does rerouting for elephant flows(namely Hedera).Compared withECMPandHedera,MCQGimproves average throughput by 11.73%and 4.29%,and normalized total throughput by 6.74%and 2.64%,respectively;MCQG improves link utilization by 23.25%and 15.07%;in addition,the average round-trip delay and packet loss rate fluctuate significantly less than the two compared strategies.
基金supported by the National Basic Research Program of China(973 Program)(2012CB315903)the Key Science and Technology Innovation Team Project of Zhejiang Province(2011R50010-05)+3 种基金the National Science and Technology Support Program(2014BAH24F01)863 Program of China(2012AA01A507)the National Natural Science Foundation of China(61379118 and 61103200)sponsored by the Research Fund of ZTE Corporation
文摘In a data center network (DCN), load balancing is required when servers transfer data on the same path. This is necessary to avoid congestion. Load balancing is challenged by the dynamic transferral of demands and complex routing control. Because of the distributed nature of a traditional network, previous research on load balancing has mostly focused on improving the performance of the local network; thus, the load has not been optimally balanced across the entire network. In this paper, we propose a novel dynamic load-balancing algorithm for fat-tree. This algorithm avoids congestions to the great possible extent by searching for non-conflicting paths in a centralized way. We implement the algorithm in the popular software-defined networking architecture and evaluate the algorithm' s performance on the Mininet platform. The results show that our algorithm has higher bisection band- width than the traditional equal-cost multi-path load-balancing algorithm and thus more effectively avoids congestion.