The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections an...The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.展开更多
Data center networks may comprise tens or hundreds of thousands of nodes,and,naturally,suffer from frequent software and hardware failures as well as link congestions.Packets are routed along the shortest paths with s...Data center networks may comprise tens or hundreds of thousands of nodes,and,naturally,suffer from frequent software and hardware failures as well as link congestions.Packets are routed along the shortest paths with sufficient resources to facilitate efficient network utilization and minimize delays.In such dynamic networks,links frequently fail or get congested,making the recalculation of the shortest paths a computationally intensive problem.Various routing protocols were proposed to overcome this problem by focusing on network utilization rather than speed.Surprisingly,the design of fast shortest-path algorithms for data centers was largely neglected,though they are universal components of routing protocols.Moreover,parallelization techniques were mostly deployed for random network topologies,and not for regular topologies that are often found in data centers.The aim of this paper is to improve scalability and reduce the time required for the shortest-path calculation in data center networks by parallelization on general-purpose hardware.We propose a novel algorithm that parallelizes edge relaxations as a faster and more scalable solution for popular data center topologies.展开更多
As a critical infrastructure of cloud computing,data center networks(DCNs)directly determine the service performance of data centers,which provide computing services for various applications such as big data processin...As a critical infrastructure of cloud computing,data center networks(DCNs)directly determine the service performance of data centers,which provide computing services for various applications such as big data processing and artificial intelligence.However,current architectures of data center networks suffer from a long routing path and a low fault tolerance between source and destination servers,which is hard to satisfy the requirements of high-performance data center networks.Based on dual-port servers and Clos network structure,this paper proposed a novel architecture RClos to construct high-performance data center networks.Logically,the proposed architecture is constructed by inserting a dual-port server into each pair of adjacent switches in the fabric of switches,where switches are connected in the form of a ring Clos structure.We describe the structural properties of RClos in terms of network scale,bisection bandwidth,and network diameter.RClos architecture inherits characteristics of its embedded Clos network,which can accommodate a large number of servers with a small average path length.The proposed architecture embraces a high fault tolerance,which adapts to the construction of various data center networks.For example,the average path length between servers is 3.44,and the standardized bisection bandwidth is 0.8 in RClos(32,5).The result of numerical experiments shows that RClos enjoys a small average path length and a high network fault tolerance,which is essential in the construction of high-performance data center networks.展开更多
With the emerging diverse applications in data centers,the demands on quality of service in data centers also become diverse,such as high throughput of elephant flows and low latency of deadline-sensitive flows.Howeve...With the emerging diverse applications in data centers,the demands on quality of service in data centers also become diverse,such as high throughput of elephant flows and low latency of deadline-sensitive flows.However,traditional TCPs are ill-suited to such situations and always result in the inefficiency(e.g.missing the flow deadline,inevitable throughput collapse)of data transfers.This further degrades the user-perceived quality of service(QoS)in data centers.To reduce the flow completion time of mice and deadline-sensitive flows along with promoting the throughput of elephant flows,an efficient and deadline-aware priority-driven congestion control(PCC)protocol,which grants mice and deadline-sensitive flows the highest priority,is proposed in this paper.Specifically,PCC computes the priority of different flows according to the size of transmitted data,the remaining data volume,and the flows’deadline.Then PCC adjusts the congestion window according to the flow priority and the degree of network congestion.Furthermore,switches in data centers control the input/output of packets based on the flow priority and the queue length.Different from existing TCPs,to speed up the data transfers of mice and deadline-sensitive flows,PCC provides an effective method to compute and encode the flow priority explicitly.According to the flow priority,switches can manage packets efficiently and ensure the data transfers of high priority flows through a weighted priority scheduling with minor modification.The experimental results prove that PCC can improve the data transfer performance of mice and deadline-sensitive flows while guaranting the throughput of elephant flows.展开更多
1 Introduction The history of data centers can be traced back to the 1960s. Early data centers were deployed on main- frames that were time-shared by users via remote terminals. The boom in data centers came duringthe...1 Introduction The history of data centers can be traced back to the 1960s. Early data centers were deployed on main- frames that were time-shared by users via remote terminals. The boom in data centers came duringthe internet era. Many companies started building large inter- net-connected facililies,展开更多
According to the high operating costs and a large number of energy waste in the current data center network architectures, we propose a kind of trusted flow preemption scheduling combining the energy-saving routing me...According to the high operating costs and a large number of energy waste in the current data center network architectures, we propose a kind of trusted flow preemption scheduling combining the energy-saving routing mechanism based on typical data center network architecture. The mechanism can make the network flow in its exclusive network link bandwidth and transmission path, which can improve the link utilization and the use of the network energy efficiency. Meanwhile, we apply trusted computing to guarantee the high security, high performance and high fault-tolerant routing forwarding service, which helps improving the average completion time of network flow.展开更多
In data centers, the transmission control protocol(TCP) incast causes catastrophic goodput degradation to applications with a many-to-one traffic pattern. In this paper, we intend to tame incast at the receiver-side a...In data centers, the transmission control protocol(TCP) incast causes catastrophic goodput degradation to applications with a many-to-one traffic pattern. In this paper, we intend to tame incast at the receiver-side application. Towards this goal, we first develop an analytical model that formulates the incast probability as a function of connection variables and network environment settings. We combine the model with the optimization theory and derive some insights into minimizing the incast probability through tuning connection variables related to applications. Then,enlightened by the analytical results, we propose an adaptive application-layer solution to the TCP incast.The solution equally allocates advertised windows to concurrent connections, and dynamically adapts the number of concurrent connections to the varying conditions. Simulation results show that our solution consistently eludes incast and achieves high goodput in various scenarios including the ones with multiple bottleneck links and background TCP traffic.展开更多
Many "rich - connected" topologies with multiple parallel paths between smwers have been proposed for data center networks recently to provide high bisection bandwidth, but it re mains challenging to fully utilize t...Many "rich - connected" topologies with multiple parallel paths between smwers have been proposed for data center networks recently to provide high bisection bandwidth, but it re mains challenging to fully utilize the high network capacity by appropriate multi- path routing algorithms. As flow-level path splitting may lead to trafl'ic imbalance between paths due to flow- size difference, packet-level path splitting attracts more attention lately, which spreads packets from flows into multiple available paths and significantly improves link utilizations. However, it may cause packet reordering, confusing the TCP congestion control algorithm and lowering the throughput of flows. In this paper, we design a novel packetlevel multi-path routing scheme called SOPA, which leverag- es OpenFlow to perform packet-level path splitting in a round- robin fashion, and hence significantly mitigates the packet reordering problem and improves the network throughput. Moreover, SOPA leverages the topological feature of data center networks to encode a very small number of switches along the path into the packet header, resulting in very light overhead. Compared with random packet spraying (RPS), Hedera and equal-cost multi-path routing (ECMP), our simulations demonstrate that SOPA achieves 29.87%, 50.41% and 77.74% higher network throughput respectively under permutation workload, and reduces average data transfer completion time by 53.65%, 343.31% and 348.25% respectively under production workload.展开更多
In a data center network (DCN), load balancing is required when servers transfer data on the same path. This is necessary to avoid congestion. Load balancing is challenged by the dynamic transferral of demands and c...In a data center network (DCN), load balancing is required when servers transfer data on the same path. This is necessary to avoid congestion. Load balancing is challenged by the dynamic transferral of demands and complex routing control. Because of the distributed nature of a traditional network, previous research on load balancing has mostly focused on improving the performance of the local network; thus, the load has not been optimally balanced across the entire network. In this paper, we propose a novel dynamic load-balancing algorithm for fat-tree. This algorithm avoids congestions to the great possible extent by searching for non-conflicting paths in a centralized way. We implement the algorithm in the popular software-defined networking architecture and evaluate the algorithm' s performance on the Mininet platform. The results show that our algorithm has higher bisection band- width than the traditional equal-cost multi-path load-balancing algorithm and thus more effectively avoids congestion.展开更多
With the continuous expansion of the data center network scale, changing network requirements, and increasing pressure on network bandwidth, the traditional network architecture can no longer meet people’s needs. The...With the continuous expansion of the data center network scale, changing network requirements, and increasing pressure on network bandwidth, the traditional network architecture can no longer meet people’s needs. The development of software defined networks has brought new opportunities and challenges to future networks. The data and control separation characteristics of SDN improve the performance of the entire network. Researchers have integrated SDN architecture into data centers to improve network resource utilization and performance. This paper first introduces the basic concepts of SDN and data center networks. Then it discusses SDN-based load balancing mechanisms for data centers from different perspectives. Finally, it summarizes and looks forward to the study on SDN-based load balancing mechanisms and its development trend.展开更多
The capability of the data center network largely decides the performance of cloud computing. However, the number of servers in the data center network becomes increasingly huge, because of the continuous growth of th...The capability of the data center network largely decides the performance of cloud computing. However, the number of servers in the data center network becomes increasingly huge, because of the continuous growth of the application requirements. The performance improvement of cloud computing faces great challenges of how to connect a large number of servers in building a data center network with promising performance. Traditional tree-based data center networks have issues of bandwidth bottleneck, failure of single switch, etc. Recently proposed data center networks such as DCell, FiConn, and BCube, have larger bandwidth and better fault-tolerance with respect to traditional tree-based data center networks. Nonetheless, for DCell and FiConn, the fault-tolerant length of path between servers increases in case of failure of switches; BCube requires higher performance in switches when its scale is enlarged. Based on the above considerations, we propose a new server-centric data center network, called BCDC, based on crossed cube with excellent performance. Then, we study the connectivity of BCDC networks. Furthermore, we propose communication algorithms and fault-tolerant routing algorithm of BCDC networks. Moreover, we analyze the performance and time complexities of the proposed algorithms in BCDC networks. Our research will provide the basis for design and implementation of a new family of data center networks.展开更多
In the rising tide of the Internet of things, more and more things in the world are connected to the Internet. Recently, data have kept growing at a rate more than four times of that expected in Moore's law. This exp...In the rising tide of the Internet of things, more and more things in the world are connected to the Internet. Recently, data have kept growing at a rate more than four times of that expected in Moore's law. This explosion of data comes from various sources such as mobile phones, video cameras and sensor networks, which often present multidi- mensional characteristics. The huge amount of data brings many challenges on the management, transportation, and pro- cessing IT infrastructures. To address these challenges, the state-of-art large scale data center networks have begun to provide cloud services that are increasingly prevalent. How- ever, how to build a good data center remains an open chal- lenge. Concurrently, the architecture design, which signifi- cantly affects the total performance, is of great research inter- est. This paper surveys advances in data center network de- sign. In this paper we first introduce the upcoming trends in the data center industry. Then we review some popular design principles for today's data center network architectures. In the third part, we present some up-to-date data center frame- works and make a comprehensive comparison of them. Dur- ing the comparison, we observe that there is no so-called op- timal data center and the design should be different referring to the data placement, replication, processing, and query pro- cessing. After that, several existing challenges and limitations are discussed. According to these observations, we point out some possible future research directions.展开更多
According to Cisco’s Internet Report 2020 white paper,there will be 29.3 billion connected devices worldwide by 2023,up from 18.4 billion in 2018.5G connections will generate nearly three times more traffic than 4G c...According to Cisco’s Internet Report 2020 white paper,there will be 29.3 billion connected devices worldwide by 2023,up from 18.4 billion in 2018.5G connections will generate nearly three times more traffic than 4G connections.While bringing a boom to the network,it also presents unprecedented challenges in terms of flow forwarding decisions.The path assignment mechanism used in traditional traffic schedulingmethods tends to cause local network congestion caused by the concentration of elephant flows,resulting in unbalanced network load and degraded quality of service.Using the centralized control of software-defined networks,this study proposes a data center traffic scheduling strategy for minimization congestion and quality of service guaranteeing(MCQG).The ideal transmission path is selected for data flows while considering the network congestion rate and quality of service.Different traffic scheduling strategies are used according to the characteristics of different service types in data centers.Reroute scheduling for elephant flows that tend to cause local congestion.The path evaluation function is formed by the maximum link utilization on the path,the number of elephant flows and the time delay,and the fast merit-seeking capability of the sparrow search algorithm is used to find the path with the lowest actual link overhead as the rerouting path for the elephant flows.It is used to reduce the possibility of local network congestion occurrence.Equal cost multi-path(ECMP)protocols with faster response time are used to schedulemouse flows with shorter duration.Used to guarantee the quality of service of the network.To achieve isolated transmission of various types of data streams.The experimental results show that the proposed strategy has higher throughput,better network load balancing,and better robustness compared to ECMP under different traffic models.In addition,because it can fully utilize the resources in the network,MCQG also outperforms another traffic scheduling strategy that does rerouting for elephant flows(namely Hedera).Compared withECMPandHedera,MCQGimproves average throughput by 11.73%and 4.29%,and normalized total throughput by 6.74%and 2.64%,respectively;MCQG improves link utilization by 23.25%and 15.07%;in addition,the average round-trip delay and packet loss rate fluctuate significantly less than the two compared strategies.展开更多
Data Center Networks (DCNs) are the fundamental infrastructure for cloud computing. Driven by the massive parallel computing tasks in cloud computing, one-to-many data dissemination becomes one of the most important...Data Center Networks (DCNs) are the fundamental infrastructure for cloud computing. Driven by the massive parallel computing tasks in cloud computing, one-to-many data dissemination becomes one of the most important traffic patterns in DCNs. Many architectures and protocols are proposed to meet this demand. However, these proposals either require complicated configurations on switches and servers, or cannot deliver an optimal performance. In this paper, we propose the peer-assisted data dissemination for DCNs. This approach utilizes the rich physical connections with high bandwidths and mutli-path connections, to facilitate efficient one-to-many data dissemination. We prove that an optimal P2P data dissemination schedule exists for FatTree, a specially- designed DCN architecture. We then present a theoretical analysis of this algorithm in the general multi-rooted tree topology, a widely-used DCN architecture. Additionally, we explore the performance of an intuitive line structure for data dissemination. Our analysis and experimental results prove that this simple structure is able to produce a comparable performance to the optimal algorithm. Since DCN applications heavily rely on virtualization to achieve optimal resource sharing, we present a general implementation method for the proposed algorithms, which aims to mitigate the impact of the potentially-high churn rate of the virtual machines.展开更多
In modern data centers, power consumed by network is an observable portion of the total energy budget and thus improving the energy efficiency of data center networks (DCNs) truly matters. One effective way for this...In modern data centers, power consumed by network is an observable portion of the total energy budget and thus improving the energy efficiency of data center networks (DCNs) truly matters. One effective way for this energy efficiency is to make the size of DCNs elastic along with traffic demands by flow consolidation and bandwidth scheduling, i.e., turning off unnecessary network components to reduce the power consumption. Meanwhile, having the instinct support for data center management, software defined networking (SDN) provides a paradigm to elastically control the resources of DCNs. To achieve such power savings, most of the prior efforts just adopt simple greedy heuristic to reduce computational complexity. However, due to the inherent problem of greedy algorithm, a good-enough optimization cannot be always guaranteed. To address this problem, a modified hybrid genetic algorithm (MHGA) is employed to improve the solution's accuracy, and the fine-grained routing function of SDN is fully leveraged. The simulation results show that more efficient power management can be achieved than the previous studies, by increasing about 5% of network energy savings.展开更多
The data center network(DCN), which is an important component of data centers, consists of a large number of hosted servers and switches connected with high speed communication links. A DCN enables the deployment of r...The data center network(DCN), which is an important component of data centers, consists of a large number of hosted servers and switches connected with high speed communication links. A DCN enables the deployment of resources centralization and on-demand access of the information and services of data centers to users. In recent years, the scale of the DCN has constantly increased with the widespread use of cloud-based services and the unprecedented amount of data delivery in/between data centers, whereas the traditional DCN architecture lacks aggregate bandwidth, scalability, and cost effectiveness for coping with the increasing demands of tenants in accessing the services of cloud data centers. Therefore, the design of a novel DCN architecture with the features of scalability, low cost, robustness, and energy conservation is required. This paper reviews the recent research findings and technologies of DCN architectures to identify the issues in the existing DCN architectures for cloud computing. We develop a taxonomy for the classification of the current DCN architectures, and also qualitatively analyze the traditional and contemporary DCN architectures. Moreover, the DCN architectures are compared on the basis of the significant characteristics, such as bandwidth, fault tolerance, scalability, overhead, and deployment cost. Finally, we put forward open research issues in the deployment of scalable, low-cost, robust, and energy-efficient DCN architecture, for data centers in computational clouds.展开更多
Cloud data centers now provide a plethora of rich online applications such as web search, social networking, and cloud computing. A key challenge for such applications, however, is to meet soft real-time constraints. ...Cloud data centers now provide a plethora of rich online applications such as web search, social networking, and cloud computing. A key challenge for such applications, however, is to meet soft real-time constraints. Due to the deadline-agnostic congestion control in Transmission Control Protocol(TCP), many deadline-sensitive flows cannot finish transmission before their deadlines. In this paper, we propose an SDNbased Explicit-Deadline-aware TCP(SED) for cloud Data Center Networks(DCN). SED assigns a base rate for non-deadline flows first and gives spare bandwidth to the deadline flows as much as possible. Subsequently,a Retransmission-enhanced SED(RSED) is introduced to solve the packet-loss timeout problem. Through our experiments, we show that SED can make flows meet deadlines effectively, and that it significantly outperforms previous protocols in the cloud data center environment.展开更多
Ethernet link aggregation, which provides an easy and cost-effective way to increase both bandwidth and link availability between a pair of devices, is well suited for data center networks. However, all the traffic sp...Ethernet link aggregation, which provides an easy and cost-effective way to increase both bandwidth and link availability between a pair of devices, is well suited for data center networks. However, all the traffic splitting algorithms used in existing Ethernet link aggregation are flow-level which do not work well owing to the traffic characteristics of data centers. Though frame-level traffic splitting can achieve optimal load balance and the maximum benefits from aggregated capacity, it is generally deprecated in most cases because of frame disordering which can disrupt the operation of many Internet protocols, most notably transmission control protocol (TCP). To address this issue, we first investigate the causes of frame disordering in link aggregation and find that all of them either are no longer true or can be prevented in data centers. Then we present a byte-counter frame-level traffic splitting algorithm which achieves optimal performance while causes no frame disordering. The only requirement is that frames in a flow are the same size which can be easily met in data centers. Simulation results show that the proposed frame-level traffic splitting method could achieve higher throughput and optimal load balance. The average completion time of different sized flows is reduced by 24% on average and by up to 46%.展开更多
To support the needs of ever-growing cloudbased services,the number of servers and network devices in data centers is increasing exponentially,which in turn results in high complexities and difficulties in network opt...To support the needs of ever-growing cloudbased services,the number of servers and network devices in data centers is increasing exponentially,which in turn results in high complexities and difficulties in network optimization.Machine learning(ML)provides an effective way to deal with these challenges by enabling network intelligence.To this end,numerous creative ML-based approaches have been put forward in recent years.Nevertheless,the intelligent optimization of data center networks(DCN)still faces enormous challenges.To the best of our knowledge,there is a lack of systematic and original investigations with in-depth analysis on intelligent DCN.To this end,in this paper,we investigate the application of ML to DCN optimization and provide a general overview and in-depth analysis of the recent works,covering flow prediction,flow classification,and resource management.Moreover,we also give unique insights into the technology evolution of the fusion of DCN and ML,together with some challenges and future research opportunities.展开更多
With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their per...With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks' sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with realworld traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26,and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.展开更多
文摘The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.
基金This work was supported by the Serbian Ministry of Science and Education(project TR-32022)by companies Telekom Srbija and Informatika.
文摘Data center networks may comprise tens or hundreds of thousands of nodes,and,naturally,suffer from frequent software and hardware failures as well as link congestions.Packets are routed along the shortest paths with sufficient resources to facilitate efficient network utilization and minimize delays.In such dynamic networks,links frequently fail or get congested,making the recalculation of the shortest paths a computationally intensive problem.Various routing protocols were proposed to overcome this problem by focusing on network utilization rather than speed.Surprisingly,the design of fast shortest-path algorithms for data centers was largely neglected,though they are universal components of routing protocols.Moreover,parallelization techniques were mostly deployed for random network topologies,and not for regular topologies that are often found in data centers.The aim of this paper is to improve scalability and reduce the time required for the shortest-path calculation in data center networks by parallelization on general-purpose hardware.We propose a novel algorithm that parallelizes edge relaxations as a faster and more scalable solution for popular data center topologies.
基金This work was supported by the Hainan Provincial Natural Science Foundation of China(620RC560,2019RC096,620RC562)the Scientific Research Setup Fund of Hainan University(KYQD(ZR)1877)+2 种基金the National Natural Science Foundation of China(62162021,82160345,61802092)the key research and development program of Hainan province(ZDYF2020199,ZDYF2021GXJS017)the key science and technology plan project of Haikou(2011-016).
文摘As a critical infrastructure of cloud computing,data center networks(DCNs)directly determine the service performance of data centers,which provide computing services for various applications such as big data processing and artificial intelligence.However,current architectures of data center networks suffer from a long routing path and a low fault tolerance between source and destination servers,which is hard to satisfy the requirements of high-performance data center networks.Based on dual-port servers and Clos network structure,this paper proposed a novel architecture RClos to construct high-performance data center networks.Logically,the proposed architecture is constructed by inserting a dual-port server into each pair of adjacent switches in the fabric of switches,where switches are connected in the form of a ring Clos structure.We describe the structural properties of RClos in terms of network scale,bisection bandwidth,and network diameter.RClos architecture inherits characteristics of its embedded Clos network,which can accommodate a large number of servers with a small average path length.The proposed architecture embraces a high fault tolerance,which adapts to the construction of various data center networks.For example,the average path length between servers is 3.44,and the standardized bisection bandwidth is 0.8 in RClos(32,5).The result of numerical experiments shows that RClos enjoys a small average path length and a high network fault tolerance,which is essential in the construction of high-performance data center networks.
基金supported part by the National Natural Science Foundation of China(61601252,61801254)Public Technology Projects of Zhejiang Province(LG-G18F020007)+1 种基金Zhejiang Provincial Natural Science Foundation of China(LY20F020008,LY18F020011,LY20F010004)K.C.Wong Magna Fund in Ningbo University。
文摘With the emerging diverse applications in data centers,the demands on quality of service in data centers also become diverse,such as high throughput of elephant flows and low latency of deadline-sensitive flows.However,traditional TCPs are ill-suited to such situations and always result in the inefficiency(e.g.missing the flow deadline,inevitable throughput collapse)of data transfers.This further degrades the user-perceived quality of service(QoS)in data centers.To reduce the flow completion time of mice and deadline-sensitive flows along with promoting the throughput of elephant flows,an efficient and deadline-aware priority-driven congestion control(PCC)protocol,which grants mice and deadline-sensitive flows the highest priority,is proposed in this paper.Specifically,PCC computes the priority of different flows according to the size of transmitted data,the remaining data volume,and the flows’deadline.Then PCC adjusts the congestion window according to the flow priority and the degree of network congestion.Furthermore,switches in data centers control the input/output of packets based on the flow priority and the queue length.Different from existing TCPs,to speed up the data transfers of mice and deadline-sensitive flows,PCC provides an effective method to compute and encode the flow priority explicitly.According to the flow priority,switches can manage packets efficiently and ensure the data transfers of high priority flows through a weighted priority scheduling with minor modification.The experimental results prove that PCC can improve the data transfer performance of mice and deadline-sensitive flows while guaranting the throughput of elephant flows.
基金supported by the ZTE-BJTU Collaborative Research Program under Grant No. K11L00190the Fundamental Research Funds for the Central Universities under Grant No. K12JB00060
文摘1 Introduction The history of data centers can be traced back to the 1960s. Early data centers were deployed on main- frames that were time-shared by users via remote terminals. The boom in data centers came duringthe internet era. Many companies started building large inter- net-connected facililies,
基金supported by the National Natural Science Foundation of China(The key trusted running technologies for the sensing nodes in Internet of things: 61501007The outstanding personnel training program of Beijing municipal Party Committee Organization Department (The Research of Trusted Computing environment for Internet of things in Smart City: 2014000020124G041
文摘According to the high operating costs and a large number of energy waste in the current data center network architectures, we propose a kind of trusted flow preemption scheduling combining the energy-saving routing mechanism based on typical data center network architecture. The mechanism can make the network flow in its exclusive network link bandwidth and transmission path, which can improve the link utilization and the use of the network energy efficiency. Meanwhile, we apply trusted computing to guarantee the high security, high performance and high fault-tolerant routing forwarding service, which helps improving the average completion time of network flow.
基金supported by the Fundamental Research Fundsfor the Central Universities under Grant No.ZYGX2015J009the Sichuan Province Scientific and Technological Support Project under Grants No.2014GZ0017 and No.2016GZ0093
文摘In data centers, the transmission control protocol(TCP) incast causes catastrophic goodput degradation to applications with a many-to-one traffic pattern. In this paper, we intend to tame incast at the receiver-side application. Towards this goal, we first develop an analytical model that formulates the incast probability as a function of connection variables and network environment settings. We combine the model with the optimization theory and derive some insights into minimizing the incast probability through tuning connection variables related to applications. Then,enlightened by the analytical results, we propose an adaptive application-layer solution to the TCP incast.The solution equally allocates advertised windows to concurrent connections, and dynamically adapts the number of concurrent connections to the varying conditions. Simulation results show that our solution consistently eludes incast and achieves high goodput in various scenarios including the ones with multiple bottleneck links and background TCP traffic.
基金supported by the National Basic Research Program of China(973 program)under Grant No.2014CB347800 and No.2012CB315803the National High-Tech R&D Program of China(863 program)under Grant No.2013AA013303+1 种基金the Natural Science Foundation of China under Grant No.61170291,No.61133006,and No.61161140454ZTE IndustryAcademia-Research Cooperation Funds
文摘Many "rich - connected" topologies with multiple parallel paths between smwers have been proposed for data center networks recently to provide high bisection bandwidth, but it re mains challenging to fully utilize the high network capacity by appropriate multi- path routing algorithms. As flow-level path splitting may lead to trafl'ic imbalance between paths due to flow- size difference, packet-level path splitting attracts more attention lately, which spreads packets from flows into multiple available paths and significantly improves link utilizations. However, it may cause packet reordering, confusing the TCP congestion control algorithm and lowering the throughput of flows. In this paper, we design a novel packetlevel multi-path routing scheme called SOPA, which leverag- es OpenFlow to perform packet-level path splitting in a round- robin fashion, and hence significantly mitigates the packet reordering problem and improves the network throughput. Moreover, SOPA leverages the topological feature of data center networks to encode a very small number of switches along the path into the packet header, resulting in very light overhead. Compared with random packet spraying (RPS), Hedera and equal-cost multi-path routing (ECMP), our simulations demonstrate that SOPA achieves 29.87%, 50.41% and 77.74% higher network throughput respectively under permutation workload, and reduces average data transfer completion time by 53.65%, 343.31% and 348.25% respectively under production workload.
基金supported by the National Basic Research Program of China(973 Program)(2012CB315903)the Key Science and Technology Innovation Team Project of Zhejiang Province(2011R50010-05)+3 种基金the National Science and Technology Support Program(2014BAH24F01)863 Program of China(2012AA01A507)the National Natural Science Foundation of China(61379118 and 61103200)sponsored by the Research Fund of ZTE Corporation
文摘In a data center network (DCN), load balancing is required when servers transfer data on the same path. This is necessary to avoid congestion. Load balancing is challenged by the dynamic transferral of demands and complex routing control. Because of the distributed nature of a traditional network, previous research on load balancing has mostly focused on improving the performance of the local network; thus, the load has not been optimally balanced across the entire network. In this paper, we propose a novel dynamic load-balancing algorithm for fat-tree. This algorithm avoids congestions to the great possible extent by searching for non-conflicting paths in a centralized way. We implement the algorithm in the popular software-defined networking architecture and evaluate the algorithm' s performance on the Mininet platform. The results show that our algorithm has higher bisection band- width than the traditional equal-cost multi-path load-balancing algorithm and thus more effectively avoids congestion.
文摘With the continuous expansion of the data center network scale, changing network requirements, and increasing pressure on network bandwidth, the traditional network architecture can no longer meet people’s needs. The development of software defined networks has brought new opportunities and challenges to future networks. The data and control separation characteristics of SDN improve the performance of the entire network. Researchers have integrated SDN architecture into data centers to improve network resource utilization and performance. This paper first introduces the basic concepts of SDN and data center networks. Then it discusses SDN-based load balancing mechanisms for data centers from different perspectives. Finally, it summarizes and looks forward to the study on SDN-based load balancing mechanisms and its development trend.
基金This paper was supported by the National Natural Science Foundation of China under Grant Nos. 61572337, 61702351, and 61602333, the Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks Foundation under Grant No. WSNLBKF201701, the China Postdoctoral Science Foundation under Grant No. 172985, the Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant No. 17KJB520036, the Jiangsu Planned Projects for Postdoctoral Research Funds under Grant No. 1701172B, and the Application Foundation Research of Suzhou of China under Grant No. SYG201653.
文摘The capability of the data center network largely decides the performance of cloud computing. However, the number of servers in the data center network becomes increasingly huge, because of the continuous growth of the application requirements. The performance improvement of cloud computing faces great challenges of how to connect a large number of servers in building a data center network with promising performance. Traditional tree-based data center networks have issues of bandwidth bottleneck, failure of single switch, etc. Recently proposed data center networks such as DCell, FiConn, and BCube, have larger bandwidth and better fault-tolerance with respect to traditional tree-based data center networks. Nonetheless, for DCell and FiConn, the fault-tolerant length of path between servers increases in case of failure of switches; BCube requires higher performance in switches when its scale is enlarged. Based on the above considerations, we propose a new server-centric data center network, called BCDC, based on crossed cube with excellent performance. Then, we study the connectivity of BCDC networks. Furthermore, we propose communication algorithms and fault-tolerant routing algorithm of BCDC networks. Moreover, we analyze the performance and time complexities of the proposed algorithms in BCDC networks. Our research will provide the basis for design and implementation of a new family of data center networks.
文摘In the rising tide of the Internet of things, more and more things in the world are connected to the Internet. Recently, data have kept growing at a rate more than four times of that expected in Moore's law. This explosion of data comes from various sources such as mobile phones, video cameras and sensor networks, which often present multidi- mensional characteristics. The huge amount of data brings many challenges on the management, transportation, and pro- cessing IT infrastructures. To address these challenges, the state-of-art large scale data center networks have begun to provide cloud services that are increasingly prevalent. How- ever, how to build a good data center remains an open chal- lenge. Concurrently, the architecture design, which signifi- cantly affects the total performance, is of great research inter- est. This paper surveys advances in data center network de- sign. In this paper we first introduce the upcoming trends in the data center industry. Then we review some popular design principles for today's data center network architectures. In the third part, we present some up-to-date data center frame- works and make a comprehensive comparison of them. Dur- ing the comparison, we observe that there is no so-called op- timal data center and the design should be different referring to the data placement, replication, processing, and query pro- cessing. After that, several existing challenges and limitations are discussed. According to these observations, we point out some possible future research directions.
基金This work is funded by the National Natural Science Foundation of China under Grant No.61772180the Key R&D plan of Hubei Province(2020BHB004,2020BAB012).
文摘According to Cisco’s Internet Report 2020 white paper,there will be 29.3 billion connected devices worldwide by 2023,up from 18.4 billion in 2018.5G connections will generate nearly three times more traffic than 4G connections.While bringing a boom to the network,it also presents unprecedented challenges in terms of flow forwarding decisions.The path assignment mechanism used in traditional traffic schedulingmethods tends to cause local network congestion caused by the concentration of elephant flows,resulting in unbalanced network load and degraded quality of service.Using the centralized control of software-defined networks,this study proposes a data center traffic scheduling strategy for minimization congestion and quality of service guaranteeing(MCQG).The ideal transmission path is selected for data flows while considering the network congestion rate and quality of service.Different traffic scheduling strategies are used according to the characteristics of different service types in data centers.Reroute scheduling for elephant flows that tend to cause local congestion.The path evaluation function is formed by the maximum link utilization on the path,the number of elephant flows and the time delay,and the fast merit-seeking capability of the sparrow search algorithm is used to find the path with the lowest actual link overhead as the rerouting path for the elephant flows.It is used to reduce the possibility of local network congestion occurrence.Equal cost multi-path(ECMP)protocols with faster response time are used to schedulemouse flows with shorter duration.Used to guarantee the quality of service of the network.To achieve isolated transmission of various types of data streams.The experimental results show that the proposed strategy has higher throughput,better network load balancing,and better robustness compared to ECMP under different traffic models.In addition,because it can fully utilize the resources in the network,MCQG also outperforms another traffic scheduling strategy that does rerouting for elephant flows(namely Hedera).Compared withECMPandHedera,MCQGimproves average throughput by 11.73%and 4.29%,and normalized total throughput by 6.74%and 2.64%,respectively;MCQG improves link utilization by 23.25%and 15.07%;in addition,the average round-trip delay and packet loss rate fluctuate significantly less than the two compared strategies.
基金supported in part by the Natural Science Foundation of USA(Nos.ECCS 1128209,CNS 10655444,CCF 1028167,CNS 0948184,and CCF 0830289)
文摘Data Center Networks (DCNs) are the fundamental infrastructure for cloud computing. Driven by the massive parallel computing tasks in cloud computing, one-to-many data dissemination becomes one of the most important traffic patterns in DCNs. Many architectures and protocols are proposed to meet this demand. However, these proposals either require complicated configurations on switches and servers, or cannot deliver an optimal performance. In this paper, we propose the peer-assisted data dissemination for DCNs. This approach utilizes the rich physical connections with high bandwidths and mutli-path connections, to facilitate efficient one-to-many data dissemination. We prove that an optimal P2P data dissemination schedule exists for FatTree, a specially- designed DCN architecture. We then present a theoretical analysis of this algorithm in the general multi-rooted tree topology, a widely-used DCN architecture. Additionally, we explore the performance of an intuitive line structure for data dissemination. Our analysis and experimental results prove that this simple structure is able to produce a comparable performance to the optimal algorithm. Since DCN applications heavily rely on virtualization to achieve optimal resource sharing, we present a general implementation method for the proposed algorithms, which aims to mitigate the impact of the potentially-high churn rate of the virtual machines.
基金supported by the Research Fund of Ministry of Education-China Mobile (MCM20160304)
文摘In modern data centers, power consumed by network is an observable portion of the total energy budget and thus improving the energy efficiency of data center networks (DCNs) truly matters. One effective way for this energy efficiency is to make the size of DCNs elastic along with traffic demands by flow consolidation and bandwidth scheduling, i.e., turning off unnecessary network components to reduce the power consumption. Meanwhile, having the instinct support for data center management, software defined networking (SDN) provides a paradigm to elastically control the resources of DCNs. To achieve such power savings, most of the prior efforts just adopt simple greedy heuristic to reduce computational complexity. However, due to the inherent problem of greedy algorithm, a good-enough optimization cannot be always guaranteed. To address this problem, a modified hybrid genetic algorithm (MHGA) is employed to improve the solution's accuracy, and the fine-grained routing function of SDN is fully leveraged. The simulation results show that more efficient power management can be achieved than the previous studies, by increasing about 5% of network energy savings.
基金Project supported by the Malaysian Ministry of Higher Education under the University of Malaya High Impact Research Grant(No.UM.C/HIR/MOHE/FCSIT/03)
文摘The data center network(DCN), which is an important component of data centers, consists of a large number of hosted servers and switches connected with high speed communication links. A DCN enables the deployment of resources centralization and on-demand access of the information and services of data centers to users. In recent years, the scale of the DCN has constantly increased with the widespread use of cloud-based services and the unprecedented amount of data delivery in/between data centers, whereas the traditional DCN architecture lacks aggregate bandwidth, scalability, and cost effectiveness for coping with the increasing demands of tenants in accessing the services of cloud data centers. Therefore, the design of a novel DCN architecture with the features of scalability, low cost, robustness, and energy conservation is required. This paper reviews the recent research findings and technologies of DCN architectures to identify the issues in the existing DCN architectures for cloud computing. We develop a taxonomy for the classification of the current DCN architectures, and also qualitatively analyze the traditional and contemporary DCN architectures. Moreover, the DCN architectures are compared on the basis of the significant characteristics, such as bandwidth, fault tolerance, scalability, overhead, and deployment cost. Finally, we put forward open research issues in the deployment of scalable, low-cost, robust, and energy-efficient DCN architecture, for data centers in computational clouds.
基金supported by the National Natural Science Foundation of China (Nos. 61370209 and 61402230)
文摘Cloud data centers now provide a plethora of rich online applications such as web search, social networking, and cloud computing. A key challenge for such applications, however, is to meet soft real-time constraints. Due to the deadline-agnostic congestion control in Transmission Control Protocol(TCP), many deadline-sensitive flows cannot finish transmission before their deadlines. In this paper, we propose an SDNbased Explicit-Deadline-aware TCP(SED) for cloud Data Center Networks(DCN). SED assigns a base rate for non-deadline flows first and gives spare bandwidth to the deadline flows as much as possible. Subsequently,a Retransmission-enhanced SED(RSED) is introduced to solve the packet-loss timeout problem. Through our experiments, we show that SED can make flows meet deadlines effectively, and that it significantly outperforms previous protocols in the cloud data center environment.
基金supported by the National Natural Science Foundation of China(61002011)the Open Fund of the State Key Laboratory of Software Development Environment(SKLSDE-2009KF-2-08)+1 种基金the National Basic Research Program of China(2009CB320505)the Hi-Tech Research and Development Program of China(2011AA01A102)
文摘Ethernet link aggregation, which provides an easy and cost-effective way to increase both bandwidth and link availability between a pair of devices, is well suited for data center networks. However, all the traffic splitting algorithms used in existing Ethernet link aggregation are flow-level which do not work well owing to the traffic characteristics of data centers. Though frame-level traffic splitting can achieve optimal load balance and the maximum benefits from aggregated capacity, it is generally deprecated in most cases because of frame disordering which can disrupt the operation of many Internet protocols, most notably transmission control protocol (TCP). To address this issue, we first investigate the causes of frame disordering in link aggregation and find that all of them either are no longer true or can be prevented in data centers. Then we present a byte-counter frame-level traffic splitting algorithm which achieves optimal performance while causes no frame disordering. The only requirement is that frames in a flow are the same size which can be easily met in data centers. Simulation results show that the proposed frame-level traffic splitting method could achieve higher throughput and optimal load balance. The average completion time of different sized flows is reduced by 24% on average and by up to 46%.
基金National Key Re-search and Development Program of China(2018YFB2101300)National Natural Science Foundation of China(61872147)+1 种基金Dean’s Fund of Engineering Research Center of Soft-ware/Hardware Co-design Technology and ApplicationMinistry of Edu-cation(East China Normal University)。
文摘To support the needs of ever-growing cloudbased services,the number of servers and network devices in data centers is increasing exponentially,which in turn results in high complexities and difficulties in network optimization.Machine learning(ML)provides an effective way to deal with these challenges by enabling network intelligence.To this end,numerous creative ML-based approaches have been put forward in recent years.Nevertheless,the intelligent optimization of data center networks(DCN)still faces enormous challenges.To the best of our knowledge,there is a lack of systematic and original investigations with in-depth analysis on intelligent DCN.To this end,in this paper,we investigate the application of ML to DCN optimization and provide a general overview and in-depth analysis of the recent works,covering flow prediction,flow classification,and resource management.Moreover,we also give unique insights into the technology evolution of the fusion of DCN and ML,together with some challenges and future research opportunities.
基金supported by the National Key R&D Program of China(No.2017YFB1003000)the National Natural Science Foundation of China(Nos.61872079,61572129,61602112,61502097,61702096,61320106007,61632008,and 61702097)+4 种基金the Natural Science Foundation of Jiangsu Province(Nos.BK20160695 and BK20170689)the Fundamental Research Funds for the Central Universities(No.2242018k1G019)the Jiangsu Provincial Key Laboratory of Network and Information Security(No.BM2003201)the Key Laboratory of Computer Network and Information Integration of Ministry of Education of China(No.93K-9)partially supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology
文摘With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks' sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with realworld traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26,and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.