With the continuous expansion of the data center network scale, changing network requirements, and increasing pressure on network bandwidth, the traditional network architecture can no longer meet people’s needs. The...With the continuous expansion of the data center network scale, changing network requirements, and increasing pressure on network bandwidth, the traditional network architecture can no longer meet people’s needs. The development of software defined networks has brought new opportunities and challenges to future networks. The data and control separation characteristics of SDN improve the performance of the entire network. Researchers have integrated SDN architecture into data centers to improve network resource utilization and performance. This paper first introduces the basic concepts of SDN and data center networks. Then it discusses SDN-based load balancing mechanisms for data centers from different perspectives. Finally, it summarizes and looks forward to the study on SDN-based load balancing mechanisms and its development trend.展开更多
Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due ...Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due to their extensive energy consumption during workload pro-cessing.Numerous research studies have examined distinct operating cost mitigation techniques for geo-distributed data centers(DCs).However,oper-ating cost savings during workload processing,which also considers string-matching techniques in geo-distributed DCs,remains unexplored.In this research,we propose a novel string matching-based geographical load balanc-ing(SMGLB)technique to mitigate the operating cost of the geo-distributed DC.The primary goal of this study is to use a string-matching algorithm(i.e.,Boyer Moore)to compare the contents of incoming workloads to those of documents that have already been processed in a data center.A successful match prevents the global load balancer from sending the user’s request to a data center for processing and displaying the results of the previously processed workload to the user to save energy.On the contrary,if no match can be discovered,the global load balancer will allocate the incoming workload to a specific DC for processing considering variable energy prices,the number of active servers,on-site green energy,and traces of incoming workload.The results of numerical evaluations show that the SMGLB can minimize the operating expenses of the geo-distributed data centers more than the existing workload distribution techniques.展开更多
Globally,digital technology and the digital economy have propelled technological revolution and industrial change,and it has become one of the main grounds of international industrial competition.It was estimated that...Globally,digital technology and the digital economy have propelled technological revolution and industrial change,and it has become one of the main grounds of international industrial competition.It was estimated that the scale of China’s digital economy would reach 50 trillion yuan in 2022,accounting for more than 40%of GDP,presenting great market potential and room for the growth of the digital economy.With the rapid development of the digital economy,the state attaches great importance to the construction of digital infrastructure and has introduced a series of policies to promote the systematic development and large-scale deployment of digital infrastructure.In 2022 the Chinese government planned to build 8 arithmetic hubs and 10 national data center clusters nationwide.To proactively address the future demand for AI across various scenarios,there is a need for a well-structured computing power infrastructure.The data center,serving as the pivotal hub for computing power,has evolved from the conventional cloud center to a more intelligent computing center,allowing for a diversified convergence of computing power supply.Besides,the data center accommodates a diverse array of arithmetic business forms from customers,reflecting the multi-industry developmental trend.The arithmetic service platform is consistently broadening its scope,with ongoing optimization and innovation in the design scheme of machine room processes.The widespread application of submerged phase-change liquid cooling technology and cold plate cooling technology introduces a series of new challenges to the construction of digital infrastructure.This paper delves into the design objectives,industry considerations,layout,and other dimensions of a smart computing center and proposes a new-generation data center solution that is“flexible,resilient,green,and low-carbon.”展开更多
With the rapid development of technologies such as big data and cloud computing,data communication and data computing in the form of exponential growth have led to a large amount of energy consumption in data centers....With the rapid development of technologies such as big data and cloud computing,data communication and data computing in the form of exponential growth have led to a large amount of energy consumption in data centers.Globally,data centers will become the world’s largest users of energy consumption,with the ratio rising from 3%in 2017 to 4.5%in 2025.Due to its unique climate and energy-saving advantages,the high-latitude area in the Pan-Arctic region has gradually become a hotspot for data center site selection in recent years.In order to predict and analyze the future energy consumption and carbon emissions of global data centers,this paper presents a new method based on global data center traffic and power usage effectiveness(PUE)for energy consumption prediction.Firstly,global data center traffic growth is predicted based on the Cisco’s research.Secondly,the dynamic global average PUE and the high latitude PUE based on Romonet simulation model are obtained,and then global data center energy consumption with two different scenarios,the decentralized scenario and the centralized scenario,is analyzed quantitatively via the polynomial fitting method.The simulation results show that,in 2030,the global data center energy consumption and carbon emissions are reduced by about 301 billion kWh and 720 million tons CO2 in the centralized scenario compared with that of the decentralized scenario,which confirms that the establishment of data centers in the Pan-Arctic region in the future can effectively relief the climate change and energy problems.This study provides support for global energy consumption prediction,and guidance for the layout of future global data centers from the perspective of energy consumption.Moreover,it provides support of the feasibility of the integration of energy and information networks under the Global Energy Interconnection conception.展开更多
How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data cente...How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data centers. The winner tree is introduced to make the data nodes as the leaf nodes of the tree and the final winner on the purpose of reducing energy consumption is selected. The complexity of large-scale cloud data centers is fully consider, and the task comparson coefficient is defined to make task scheduling strategy more reasonable. Experiments and performance analysis show that the proposed algorithm can effectively improve the node utilization, and reduce the overall power consumption of the cloud data center.展开更多
The development of cloud computing and virtualization technology has brought great challenges to the reliability of data center services.Data centers typically contain a large number of compute and storage nodes which...The development of cloud computing and virtualization technology has brought great challenges to the reliability of data center services.Data centers typically contain a large number of compute and storage nodes which may fail and affect the quality of service.Failure prediction is an important means of ensuring service availability.Predicting node failure in cloud-based data centers is challenging because the failure symptoms reflected have complex characteristics,and the distribution imbalance between the failure sample and the normal sample is widespread,resulting in inaccurate failure prediction.Targeting these challenges,this paper proposes a novel failure prediction method FP-STE(Failure Prediction based on Spatio-temporal Feature Extraction).Firstly,an improved recurrent neural network HW-GRU(Improved GRU based on HighWay network)and a convolutional neural network CNN are used to extract the temporal features and spatial features of multivariate data respectively to increase the discrimination of different types of failure symptoms which improves the accuracy of prediction.Then the intermediate results of the two models are added as features into SCSXGBoost to predict the possibility and the precise type of node failure in the future.SCS-XGBoost is an ensemble learning model that is improved by the integrated strategy of oversampling and cost-sensitive learning.Experimental results based on real data sets confirm the effectiveness and superiority of FP-STE.展开更多
We consider differentiated timecritical task scheduling in a N×N input queued optical packet s w itch to ens ure 100% throughput and meet different delay requirements among various modules of data center. Existin...We consider differentiated timecritical task scheduling in a N×N input queued optical packet s w itch to ens ure 100% throughput and meet different delay requirements among various modules of data center. Existing schemes either consider slot-by-slot scheduling with queue depth serving as the delay metric or assume that each input-output connection has the same delay bound in the batch scheduling mode. The former scheme neglects the effect of reconfiguration overhead, which may result in crippled system performance, while the latter cannot satisfy users' differentiated Quality of Service(Qo S) requirements. To make up these deficiencies, we propose a new batch scheduling scheme to meet the various portto-port delay requirements in a best-effort manner. Moreover, a speedup is considered to compensate for both the reconfiguration overhead and the unavoidable slots wastage in the switch fabric. With traffic matrix and delay constraint matrix given, this paper proposes two heuristic algorithms Stringent Delay First(SDF) and m-order SDF(m-SDF) to realize the 100% packet switching, while maximizing the delay constraints satisfaction ratio. The performance of our scheme is verified by extensive numerical simulations.展开更多
Global data traffic is growing rapidly,and the demand for optoelectronic transceivers applied in data centers(DCs)is also increasing correspondingly.In this review,we first briefly introduce the development of optoele...Global data traffic is growing rapidly,and the demand for optoelectronic transceivers applied in data centers(DCs)is also increasing correspondingly.In this review,we first briefly introduce the development of optoelectronics transceivers in DCs,as well as the advantages of silicon photonic chips fabricated by complementary metal oxide semiconductor process.We also summarize the research on the main components in silicon photonic transceivers.In particular,quantum dot lasers have shown great potential as light sources for silicon photonic integration—whether to adopt bonding method or monolithic integration—thanks to their unique advantages over the conventional quantum-well counterparts.Some of the solutions for highspeed optical interconnection in DCs are then discussed.Among them,wavelength division multiplexing and four-level pulseamplitude modulation have been widely studied and applied.At present,the application of coherent optical communication technology has moved from the backbone network,to the metro network,and then to DCs.展开更多
Based on the Saudi Green initiative,which aims to improve the Kingdom’s environmental status and reduce the carbon emission of more than 278 million tons by 2030 along with a promising plan to achieve netzero carbon ...Based on the Saudi Green initiative,which aims to improve the Kingdom’s environmental status and reduce the carbon emission of more than 278 million tons by 2030 along with a promising plan to achieve netzero carbon by 2060,NEOM city has been proposed to be the“Saudi hub”for green energy,since NEOM is estimated to generate up to 120 Gigawatts(GW)of renewable energy by 2030.Nevertheless,the Information and Communication Technology(ICT)sector is considered a key contributor to global energy consumption and carbon emissions.The data centers are estimated to consume about 13%of the overall global electricity demand by 2030.Thus,reducing the total carbon emissions of the ICT sector plays a vital factor in achieving the Saudi plan to minimize global carbon emissions.Therefore,this paper aims to propose an eco-friendly approach using a Mixed-Integer Linear Programming(MILP)model to reduce the carbon emissions associated with ICT infrastructure in Saudi Arabia.This approach considers the Saudi National Fiber Network(SNFN)as the backbone of Saudi Internet infrastructure.First,we compare two different scenarios of data center locations.The first scenario considers a traditional cloud data center located in Jeddah and Riyadh,whereas the second scenario considers NEOM as a potential cloud data center new location to take advantage of its green energy infrastructure.Then,we calculate the energy consumption and carbon emissions of cloud data centers and their associated energy costs.After that,we optimize the energy efficiency of different cloud data centers’locations(in the SNFN)to reduce the associated carbon emissions and energy costs.Simulation results show that the proposed approach can save up to 94%of the carbon emissions and 62%of the energy cost compared to the current cloud physical topology.These savings are achieved due to the shifting of cloud data centers from cities that have conventional energy sources to a city that has rich in renewable energy sources.Finally,we design a heuristic algorithm to verify the proposed approach,and it gives equivalent results to the MILP model.展开更多
The primary focus of this paper is to design a progressive restoration plan for an enterprise data center environment following a partial or full disruption. Repairing and restoring disrupted components in an enterpri...The primary focus of this paper is to design a progressive restoration plan for an enterprise data center environment following a partial or full disruption. Repairing and restoring disrupted components in an enterprise data center requires a significant amount of time and human effort. Following a major disruption, the recovery process involves multiple stages, and during each stage, the partially recovered infrastructures can provide limited services to users at some degraded service level. However, how fast and efficiently an enterprise infrastructure can be recovered de- pends on how the recovery mechanism restores the disrupted components, considering the inter-dependencies between services, along with the limitations of expert human operators. The entire problem turns out to be NP- hard and rather complex, and we devise an efficient meta-heuristic to solve the problem. By considering some real-world examples, we show that the proposed meta-heuristic provides very accurate results, and still runs 600-2800 times faster than the optimal solution obtained from a general purpose mathematical solver [1].展开更多
Edge data centers(EDCs)have been widely developed recently to supply delay-sensitive computing services,which impose prohibitively increasing electricity costs for EDC operators.This paper presents a new spatiotempora...Edge data centers(EDCs)have been widely developed recently to supply delay-sensitive computing services,which impose prohibitively increasing electricity costs for EDC operators.This paper presents a new spatiotemporal reallocation(STR)method for energy management in EDCs.This method uses spare resources,including servers and energy storage systems(ESSs)within EDCs to reduce energy costs based on both spatial and temporal features of spare resources.This solution:1)reallocates flexible workload between EDCs within one cluster;and 2)coordinates the electricity load of data processing,ESSs and distributed energy resources(DERs)within one EDC cluster to gain benefits from flexible electricity tariffs.In addition,this paper for the first time develops a Bit-Watt transformation to simplify the STR method and represent the relationship between data workload and electricity consumption of EDCs.Case studies justifying the developed STR method delivers satisfying cost reductions with robustness.The STR method fully utilized both spatial and temporal features of spare resources in EDCs to gain benefits from 1)varying electricity tariffs,and 2)maximumly consuming DER generation.展开更多
Data centers are often equipped with multiple cooling units. Here, an aquifer thermal energy storage (ATES) system has shown to be efficient. However, the usage of hot and cold-water wells in the ATES must be balanced...Data centers are often equipped with multiple cooling units. Here, an aquifer thermal energy storage (ATES) system has shown to be efficient. However, the usage of hot and cold-water wells in the ATES must be balanced for legal and environmental reasons. Reinforcement Learning has been proven to be a useful tool for optimizing the cooling operation at data centers. Nonetheless, since cooling demand changes continuously, balancing the ATES usage on a yearly basis imposes an additional challenge in the form of a delayed reward. To overcome this, we formulate a return decomposition, Cool-RUDDER, which relies on simple domain knowledge and needs no training. We trained a proximal policy optimization agent to keep server temperatures steady while minimizing operational costs. Comparing the Cool-RUDDER reward signal to other ATES-associated rewards, all models kept the server temperatures steady at around 30 °C. An optimal ATES balance was defined to be 0% and a yearly imbalance of −4.9% with a confidence interval of [−6.2, −3.8]% was achieved for the Cool 2.0 reward. This outperformed a baseline ATES-associated reward of 0 at −16.3% with a confidence interval of [−17.1, −15.4]% and all other ATES-associated rewards. However, the improved ATES balance comes with a higher energy consumption cost of 12.5% when comparing the relative cost of the Cool 2.0 reward to the zero reward, resulting in a trade-off. Moreover, the method comes with limited requirements and is applicable to any long-term problem satisfying a linear state-transition system.展开更多
Fault detection and diagnosis are essential to the air conditioning system of the data center for elevating reliability and reducing energy consumption.This study proposed a convolutional neural network(CNN)based data...Fault detection and diagnosis are essential to the air conditioning system of the data center for elevating reliability and reducing energy consumption.This study proposed a convolutional neural network(CNN)based data-driven fault detection and diagnosis model considering temporal dependency for composite air conditioning system that is capable of cooling the high heat flux in data centers.The input of fault detection and diagnosis model was an unsteady dataset generated by the experimentally validated transient mathematical model.The dataset concerned three typical faults,including refrigerant leakage,evaporator fan breakdown,and condenser fouling.Then,the CNN model was trained to construct a map between the input and system operating conditions.Further,the performance of the CNN model was validated by comparing it with the support vector machine and the neural network.Finally,the score-weighted class mapping activation method was utilized to interpret model diagnosis mechanisms and to identify key input features in various operating modes.The results demonstrated in the pump-driven heat pipe mode,the accuracy of the CNN model was 99.14%,increasing by around 8.5%compared with the other two methods.In the vapor compression mode,the accuracy of the CNN model achieved 99.9%and declined the miss rate of refrigerant leakage by at least 61%comparatively.The score-weighted class mapping activation results indicated the ambient temperature and the actuator-related parameters,such as compressor frequency in vapor compression mode and condenser fan frequency in pump-driven heat pipe mode,were essential features in system fault detection and diagnosis.展开更多
With the promotion of“dual carbon”strategy,data center(DC)access to high-penetration renewable energy sources(RESs)has become a trend in the industry.However,the uncertainty of RES poses challenges to the safe and s...With the promotion of“dual carbon”strategy,data center(DC)access to high-penetration renewable energy sources(RESs)has become a trend in the industry.However,the uncertainty of RES poses challenges to the safe and stable operation of DCs and power grids.In this paper,a multi-timescale optimal scheduling model is established for interconnected data centers(IDCs)based on model predictive control(MPC),including day-ahead optimization,intraday rolling optimization,and intraday real-time correction.The day-ahead optimization stage aims at the lowest operating cost,the rolling optimization stage aims at the lowest intraday economic cost,and the real-time correction aims at the lowest power fluctuation,eliminating the impact of prediction errors through coordinated multi-timescale optimization.The simulation results show that the economic loss is reduced by 19.6%,and the power fluctuation is decreased by 15.23%.展开更多
In today’s fast-paced,information-driven world,data centers can offer high-speed,intricate capabilities on a larger scale owing to the ever-growing demand for networks and information systems.Because data centers pro...In today’s fast-paced,information-driven world,data centers can offer high-speed,intricate capabilities on a larger scale owing to the ever-growing demand for networks and information systems.Because data centers process and transmit information,stability and reliability are important.Data center power supply architectures rely heavily on isolated bidirectional DC-DC converters to ensure safety and stability.For the smooth operation of a data center,the power supply must be reliable and uninterrupted.In this study,we summarize the basic principle,topology,switch conversion strategy,and control technology of the existing isolated bidirectional DC-DC converters.Subsequently,existing research results and problems with isolated bidirectional DC-DC converters are reviewed.Finally,future trends in the development of isolated bidirectional DC-DC converters for data centers are presented,which offer valuable insights for solving engineering obstacles and future research directions in the field.展开更多
To enhance the resilience of power systems with offshore wind farms(OWFs),a proactive scheduling scheme is proposed to unlock the flexibility of cloud data centers(CDCs)responding to uncertain spatial and temporal imp...To enhance the resilience of power systems with offshore wind farms(OWFs),a proactive scheduling scheme is proposed to unlock the flexibility of cloud data centers(CDCs)responding to uncertain spatial and temporal impacts induced by hurricanes.The total life simulation(TLS)is adopted to project the local weather conditions at transmission lines and OWFs,before,during,and after the hurricane.The static power curve of wind turbines(WTs)is used to capture the output of OWFs,and the fragility analysis of transmission-line components is used to formulate the time-varying failure rates of transmission lines.A novel distributionally robust ambiguity set is constructed with a discrete support set,where the impacts of hurricanes are depicted by these supports.To minimize load sheddings and dropping workloads,the spatial and temporal demand response capabilities of CDCs according to task migration and delay tolerance are incorporated into resilient management.The flexibilities of CDC’s power consumption are integrated into a two-stage distributionally robust optimization problem with conditional value at risk(CVaR).Based on Lagrange duality,this problem is reformulated into its deterministic counterpart and solved by a novel decomposition method with hybrid cuts,admitting fewer iterations and a faster convergence rate.The effectiveness of the proposed resilient management strategy is verified through case studies conducted on the modified IEEERTS 24 system,which includes 4 data centers and 5 offshore wind farms.展开更多
This paper systematically describes the technical principles,evaluation indicators,system forms and research progress of air-side evaporative cooling air conditioning systems,water-side evaporative cooling air conditi...This paper systematically describes the technical principles,evaluation indicators,system forms and research progress of air-side evaporative cooling air conditioning systems,water-side evaporative cooling air conditioning systems and freon-side evaporative cold coagulation heat air conditioning systems of Data center.In order to reduce the energy consumption of the refrigeration and air-conditioning system in the Data center,the applica-tion conditions and scenarios of the different forms of evaporative cooling air-conditioning systems should be considered comprehensively.Therefore,it is very important that the renewable energy-dry air can be used to the greatest extent.These efforts would contribute to China’s 2030"Carbon Peak"and 2060"Carbon Neutral."展开更多
In the Ethernet lossless Data Center Networks (DCNs) deployedwith Priority-based Flow Control (PFC), the head-of-line blocking problemis still difficult to prevent due to PFC triggering under burst trafficscenarios ev...In the Ethernet lossless Data Center Networks (DCNs) deployedwith Priority-based Flow Control (PFC), the head-of-line blocking problemis still difficult to prevent due to PFC triggering under burst trafficscenarios even with the existing congestion control solutions. To addressthe head-of-line blocking problem of PFC, we propose a new congestioncontrol mechanism. The key point of Congestion Control Using In-NetworkTelemetry for Lossless Datacenters (ICC) is to use In-Network Telemetry(INT) technology to obtain comprehensive congestion information, which isthen fed back to the sender to adjust the sending rate timely and accurately.It is possible to control congestion in time, converge to the target rate quickly,and maintain a near-zero queue length at the switch when using ICC. Weconducted Network Simulator-3 (NS-3) simulation experiments to test theICC’s performance. When compared to Congestion Control for Large-ScaleRDMA Deployments (DCQCN), TIMELY: RTT-based Congestion Controlfor the Datacenter (TIMELY), and Re-architecting Congestion Managementin Lossless Ethernet (PCN), ICC effectively reduces PFC pause messages andFlow Completion Time (FCT) by 47%, 56%, 34%, and 15.3×, 14.8×, and11.2×, respectively.展开更多
The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections an...The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.展开更多
Cloud Datacenter Network(CDN)providers usually have the option to scale their network structures to allow for far more resource capacities,though such scaling options may come with exponential costs that contradict th...Cloud Datacenter Network(CDN)providers usually have the option to scale their network structures to allow for far more resource capacities,though such scaling options may come with exponential costs that contradict their utility objectives.Yet,besides the cost of the physical assets and network resources,such scaling may also imposemore loads on the electricity power grids to feed the added nodes with the required energy to run and cool,which comes with extra costs too.Thus,those CDNproviders who utilize their resources better can certainly afford their services at lower price-units when compared to others who simply choose the scaling solutions.Resource utilization is a quite challenging process;indeed,clients of CDNs usually tend to exaggerate their true resource requirements when they lease their resources.Service providers are committed to their clients with Service Level Agreements(SLAs).Therefore,any amendment to the resource allocations needs to be approved by the clients first.In this work,we propose deploying a Stackelberg leadership framework to formulate a negotiation game between the cloud service providers and their client tenants.Through this,the providers seek to retrieve those leased unused resources from their clients.Cooperation is not expected from the clients,and they may ask high price units to return their extra resources to the provider’s premises.Hence,to motivate cooperation in such a non-cooperative game,as an extension to theVickery auctions,we developed an incentive-compatible pricingmodel for the returned resources.Moreover,we also proposed building a behavior belief function that shapes the way of negotiation and compensation for each client.Compared to other benchmark models,the assessment results showthat our proposed models provide for timely negotiation schemes,allowing for better resource utilization rates,higher utilities,and grid-friend CDNs.展开更多
文摘With the continuous expansion of the data center network scale, changing network requirements, and increasing pressure on network bandwidth, the traditional network architecture can no longer meet people’s needs. The development of software defined networks has brought new opportunities and challenges to future networks. The data and control separation characteristics of SDN improve the performance of the entire network. Researchers have integrated SDN architecture into data centers to improve network resource utilization and performance. This paper first introduces the basic concepts of SDN and data center networks. Then it discusses SDN-based load balancing mechanisms for data centers from different perspectives. Finally, it summarizes and looks forward to the study on SDN-based load balancing mechanisms and its development trend.
文摘Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due to their extensive energy consumption during workload pro-cessing.Numerous research studies have examined distinct operating cost mitigation techniques for geo-distributed data centers(DCs).However,oper-ating cost savings during workload processing,which also considers string-matching techniques in geo-distributed DCs,remains unexplored.In this research,we propose a novel string matching-based geographical load balanc-ing(SMGLB)technique to mitigate the operating cost of the geo-distributed DC.The primary goal of this study is to use a string-matching algorithm(i.e.,Boyer Moore)to compare the contents of incoming workloads to those of documents that have already been processed in a data center.A successful match prevents the global load balancer from sending the user’s request to a data center for processing and displaying the results of the previously processed workload to the user to save energy.On the contrary,if no match can be discovered,the global load balancer will allocate the incoming workload to a specific DC for processing considering variable energy prices,the number of active servers,on-site green energy,and traces of incoming workload.The results of numerical evaluations show that the SMGLB can minimize the operating expenses of the geo-distributed data centers more than the existing workload distribution techniques.
文摘Globally,digital technology and the digital economy have propelled technological revolution and industrial change,and it has become one of the main grounds of international industrial competition.It was estimated that the scale of China’s digital economy would reach 50 trillion yuan in 2022,accounting for more than 40%of GDP,presenting great market potential and room for the growth of the digital economy.With the rapid development of the digital economy,the state attaches great importance to the construction of digital infrastructure and has introduced a series of policies to promote the systematic development and large-scale deployment of digital infrastructure.In 2022 the Chinese government planned to build 8 arithmetic hubs and 10 national data center clusters nationwide.To proactively address the future demand for AI across various scenarios,there is a need for a well-structured computing power infrastructure.The data center,serving as the pivotal hub for computing power,has evolved from the conventional cloud center to a more intelligent computing center,allowing for a diversified convergence of computing power supply.Besides,the data center accommodates a diverse array of arithmetic business forms from customers,reflecting the multi-industry developmental trend.The arithmetic service platform is consistently broadening its scope,with ongoing optimization and innovation in the design scheme of machine room processes.The widespread application of submerged phase-change liquid cooling technology and cold plate cooling technology introduces a series of new challenges to the construction of digital infrastructure.This paper delves into the design objectives,industry considerations,layout,and other dimensions of a smart computing center and proposes a new-generation data center solution that is“flexible,resilient,green,and low-carbon.”
基金supported by National Natural Science Foundation of China(61472042)Corporation Science and Technology Program of Global Energy Interconnection Group Ltd.(GEIGC-D-[2018]024)
文摘With the rapid development of technologies such as big data and cloud computing,data communication and data computing in the form of exponential growth have led to a large amount of energy consumption in data centers.Globally,data centers will become the world’s largest users of energy consumption,with the ratio rising from 3%in 2017 to 4.5%in 2025.Due to its unique climate and energy-saving advantages,the high-latitude area in the Pan-Arctic region has gradually become a hotspot for data center site selection in recent years.In order to predict and analyze the future energy consumption and carbon emissions of global data centers,this paper presents a new method based on global data center traffic and power usage effectiveness(PUE)for energy consumption prediction.Firstly,global data center traffic growth is predicted based on the Cisco’s research.Secondly,the dynamic global average PUE and the high latitude PUE based on Romonet simulation model are obtained,and then global data center energy consumption with two different scenarios,the decentralized scenario and the centralized scenario,is analyzed quantitatively via the polynomial fitting method.The simulation results show that,in 2030,the global data center energy consumption and carbon emissions are reduced by about 301 billion kWh and 720 million tons CO2 in the centralized scenario compared with that of the decentralized scenario,which confirms that the establishment of data centers in the Pan-Arctic region in the future can effectively relief the climate change and energy problems.This study provides support for global energy consumption prediction,and guidance for the layout of future global data centers from the perspective of energy consumption.Moreover,it provides support of the feasibility of the integration of energy and information networks under the Global Energy Interconnection conception.
基金supported by the National Natural Science Foundation of China(6120200461272084)+9 种基金the National Key Basic Research Program of China(973 Program)(2011CB302903)the Specialized Research Fund for the Doctoral Program of Higher Education(2009322312000120113223110003)the China Postdoctoral Science Foundation Funded Project(2011M5000952012T50514)the Natural Science Foundation of Jiangsu Province(BK2011754BK2009426)the Jiangsu Postdoctoral Science Foundation Funded Project(1102103C)the Natural Science Fund of Higher Education of Jiangsu Province(12KJB520007)the Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions(yx002001)
文摘How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data centers. The winner tree is introduced to make the data nodes as the leaf nodes of the tree and the final winner on the purpose of reducing energy consumption is selected. The complexity of large-scale cloud data centers is fully consider, and the task comparson coefficient is defined to make task scheduling strategy more reasonable. Experiments and performance analysis show that the proposed algorithm can effectively improve the node utilization, and reduce the overall power consumption of the cloud data center.
基金supported in part by National Key Research and Development Program of China(2019YFB2103200)NSFC(61672108),Open Subject Funds of Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory(SKX182010049)+1 种基金the Fundamental Research Funds for the Central Universities(5004193192019PTB-019)the Industrial Internet Innovation and Development Project 2018 of China.
文摘The development of cloud computing and virtualization technology has brought great challenges to the reliability of data center services.Data centers typically contain a large number of compute and storage nodes which may fail and affect the quality of service.Failure prediction is an important means of ensuring service availability.Predicting node failure in cloud-based data centers is challenging because the failure symptoms reflected have complex characteristics,and the distribution imbalance between the failure sample and the normal sample is widespread,resulting in inaccurate failure prediction.Targeting these challenges,this paper proposes a novel failure prediction method FP-STE(Failure Prediction based on Spatio-temporal Feature Extraction).Firstly,an improved recurrent neural network HW-GRU(Improved GRU based on HighWay network)and a convolutional neural network CNN are used to extract the temporal features and spatial features of multivariate data respectively to increase the discrimination of different types of failure symptoms which improves the accuracy of prediction.Then the intermediate results of the two models are added as features into SCSXGBoost to predict the possibility and the precise type of node failure in the future.SCS-XGBoost is an ensemble learning model that is improved by the integrated strategy of oversampling and cost-sensitive learning.Experimental results based on real data sets confirm the effectiveness and superiority of FP-STE.
基金supported by the Major State Basic Research Program of China (973 project No. 2013CB329301 and 2010CB327806)the Natural Science Fund of China (NSFC project No. 61372085, 61032003, 61271165 and 61202379)+1 种基金the Research Fund for the Doctoral Program of Higher Education of China (RFDP project No. 20120185110025, 20120185110030 and 20120032120041)supported by Tianjin Key Laboratory of Cognitive Computing and Application, School of Computer Science and Technology, Tianjin University, Tianjin, P. R. China
文摘We consider differentiated timecritical task scheduling in a N×N input queued optical packet s w itch to ens ure 100% throughput and meet different delay requirements among various modules of data center. Existing schemes either consider slot-by-slot scheduling with queue depth serving as the delay metric or assume that each input-output connection has the same delay bound in the batch scheduling mode. The former scheme neglects the effect of reconfiguration overhead, which may result in crippled system performance, while the latter cannot satisfy users' differentiated Quality of Service(Qo S) requirements. To make up these deficiencies, we propose a new batch scheduling scheme to meet the various portto-port delay requirements in a best-effort manner. Moreover, a speedup is considered to compensate for both the reconfiguration overhead and the unavoidable slots wastage in the switch fabric. With traffic matrix and delay constraint matrix given, this paper proposes two heuristic algorithms Stringent Delay First(SDF) and m-order SDF(m-SDF) to realize the 100% packet switching, while maximizing the delay constraints satisfaction ratio. The performance of our scheme is verified by extensive numerical simulations.
基金supported by the National Key Research and Development Program of China under Grant No.2016YFB 0402302the National Natural Science Foundation of China under Grant No.91433206。
文摘Global data traffic is growing rapidly,and the demand for optoelectronic transceivers applied in data centers(DCs)is also increasing correspondingly.In this review,we first briefly introduce the development of optoelectronics transceivers in DCs,as well as the advantages of silicon photonic chips fabricated by complementary metal oxide semiconductor process.We also summarize the research on the main components in silicon photonic transceivers.In particular,quantum dot lasers have shown great potential as light sources for silicon photonic integration—whether to adopt bonding method or monolithic integration—thanks to their unique advantages over the conventional quantum-well counterparts.Some of the solutions for highspeed optical interconnection in DCs are then discussed.Among them,wavelength division multiplexing and four-level pulseamplitude modulation have been widely studied and applied.At present,the application of coherent optical communication technology has moved from the backbone network,to the metro network,and then to DCs.
文摘Based on the Saudi Green initiative,which aims to improve the Kingdom’s environmental status and reduce the carbon emission of more than 278 million tons by 2030 along with a promising plan to achieve netzero carbon by 2060,NEOM city has been proposed to be the“Saudi hub”for green energy,since NEOM is estimated to generate up to 120 Gigawatts(GW)of renewable energy by 2030.Nevertheless,the Information and Communication Technology(ICT)sector is considered a key contributor to global energy consumption and carbon emissions.The data centers are estimated to consume about 13%of the overall global electricity demand by 2030.Thus,reducing the total carbon emissions of the ICT sector plays a vital factor in achieving the Saudi plan to minimize global carbon emissions.Therefore,this paper aims to propose an eco-friendly approach using a Mixed-Integer Linear Programming(MILP)model to reduce the carbon emissions associated with ICT infrastructure in Saudi Arabia.This approach considers the Saudi National Fiber Network(SNFN)as the backbone of Saudi Internet infrastructure.First,we compare two different scenarios of data center locations.The first scenario considers a traditional cloud data center located in Jeddah and Riyadh,whereas the second scenario considers NEOM as a potential cloud data center new location to take advantage of its green energy infrastructure.Then,we calculate the energy consumption and carbon emissions of cloud data centers and their associated energy costs.After that,we optimize the energy efficiency of different cloud data centers’locations(in the SNFN)to reduce the associated carbon emissions and energy costs.Simulation results show that the proposed approach can save up to 94%of the carbon emissions and 62%of the energy cost compared to the current cloud physical topology.These savings are achieved due to the shifting of cloud data centers from cities that have conventional energy sources to a city that has rich in renewable energy sources.Finally,we design a heuristic algorithm to verify the proposed approach,and it gives equivalent results to the MILP model.
文摘The primary focus of this paper is to design a progressive restoration plan for an enterprise data center environment following a partial or full disruption. Repairing and restoring disrupted components in an enterprise data center requires a significant amount of time and human effort. Following a major disruption, the recovery process involves multiple stages, and during each stage, the partially recovered infrastructures can provide limited services to users at some degraded service level. However, how fast and efficiently an enterprise infrastructure can be recovered de- pends on how the recovery mechanism restores the disrupted components, considering the inter-dependencies between services, along with the limitations of expert human operators. The entire problem turns out to be NP- hard and rather complex, and we devise an efficient meta-heuristic to solve the problem. By considering some real-world examples, we show that the proposed meta-heuristic provides very accurate results, and still runs 600-2800 times faster than the optimal solution obtained from a general purpose mathematical solver [1].
文摘Edge data centers(EDCs)have been widely developed recently to supply delay-sensitive computing services,which impose prohibitively increasing electricity costs for EDC operators.This paper presents a new spatiotemporal reallocation(STR)method for energy management in EDCs.This method uses spare resources,including servers and energy storage systems(ESSs)within EDCs to reduce energy costs based on both spatial and temporal features of spare resources.This solution:1)reallocates flexible workload between EDCs within one cluster;and 2)coordinates the electricity load of data processing,ESSs and distributed energy resources(DERs)within one EDC cluster to gain benefits from flexible electricity tariffs.In addition,this paper for the first time develops a Bit-Watt transformation to simplify the STR method and represent the relationship between data workload and electricity consumption of EDCs.Case studies justifying the developed STR method delivers satisfying cost reductions with robustness.The STR method fully utilized both spatial and temporal features of spare resources in EDCs to gain benefits from 1)varying electricity tariffs,and 2)maximumly consuming DER generation.
基金the project titled ‘Cool-Data Flexible Cooling of Data Centers’ and was financed by the Innovation Fund Denmark (nr. 0177-00066B).
文摘Data centers are often equipped with multiple cooling units. Here, an aquifer thermal energy storage (ATES) system has shown to be efficient. However, the usage of hot and cold-water wells in the ATES must be balanced for legal and environmental reasons. Reinforcement Learning has been proven to be a useful tool for optimizing the cooling operation at data centers. Nonetheless, since cooling demand changes continuously, balancing the ATES usage on a yearly basis imposes an additional challenge in the form of a delayed reward. To overcome this, we formulate a return decomposition, Cool-RUDDER, which relies on simple domain knowledge and needs no training. We trained a proximal policy optimization agent to keep server temperatures steady while minimizing operational costs. Comparing the Cool-RUDDER reward signal to other ATES-associated rewards, all models kept the server temperatures steady at around 30 °C. An optimal ATES balance was defined to be 0% and a yearly imbalance of −4.9% with a confidence interval of [−6.2, −3.8]% was achieved for the Cool 2.0 reward. This outperformed a baseline ATES-associated reward of 0 at −16.3% with a confidence interval of [−17.1, −15.4]% and all other ATES-associated rewards. However, the improved ATES balance comes with a higher energy consumption cost of 12.5% when comparing the relative cost of the Cool 2.0 reward to the zero reward, resulting in a trade-off. Moreover, the method comes with limited requirements and is applicable to any long-term problem satisfying a linear state-transition system.
基金the support from the National Natural Science Foundation of China(Grant number 52176180)the support from“the open competition mechanism to select the best candidates”key technology project of Liaoning(Grant 2022JH1/10800008).
文摘Fault detection and diagnosis are essential to the air conditioning system of the data center for elevating reliability and reducing energy consumption.This study proposed a convolutional neural network(CNN)based data-driven fault detection and diagnosis model considering temporal dependency for composite air conditioning system that is capable of cooling the high heat flux in data centers.The input of fault detection and diagnosis model was an unsteady dataset generated by the experimentally validated transient mathematical model.The dataset concerned three typical faults,including refrigerant leakage,evaporator fan breakdown,and condenser fouling.Then,the CNN model was trained to construct a map between the input and system operating conditions.Further,the performance of the CNN model was validated by comparing it with the support vector machine and the neural network.Finally,the score-weighted class mapping activation method was utilized to interpret model diagnosis mechanisms and to identify key input features in various operating modes.The results demonstrated in the pump-driven heat pipe mode,the accuracy of the CNN model was 99.14%,increasing by around 8.5%compared with the other two methods.In the vapor compression mode,the accuracy of the CNN model achieved 99.9%and declined the miss rate of refrigerant leakage by at least 61%comparatively.The score-weighted class mapping activation results indicated the ambient temperature and the actuator-related parameters,such as compressor frequency in vapor compression mode and condenser fan frequency in pump-driven heat pipe mode,were essential features in system fault detection and diagnosis.
文摘With the promotion of“dual carbon”strategy,data center(DC)access to high-penetration renewable energy sources(RESs)has become a trend in the industry.However,the uncertainty of RES poses challenges to the safe and stable operation of DCs and power grids.In this paper,a multi-timescale optimal scheduling model is established for interconnected data centers(IDCs)based on model predictive control(MPC),including day-ahead optimization,intraday rolling optimization,and intraday real-time correction.The day-ahead optimization stage aims at the lowest operating cost,the rolling optimization stage aims at the lowest intraday economic cost,and the real-time correction aims at the lowest power fluctuation,eliminating the impact of prediction errors through coordinated multi-timescale optimization.The simulation results show that the economic loss is reduced by 19.6%,and the power fluctuation is decreased by 15.23%.
基金Supported by the Natural Science Foundation for Distinguished Young Scholars of Guangdong Province(2022B1515020002).
文摘In today’s fast-paced,information-driven world,data centers can offer high-speed,intricate capabilities on a larger scale owing to the ever-growing demand for networks and information systems.Because data centers process and transmit information,stability and reliability are important.Data center power supply architectures rely heavily on isolated bidirectional DC-DC converters to ensure safety and stability.For the smooth operation of a data center,the power supply must be reliable and uninterrupted.In this study,we summarize the basic principle,topology,switch conversion strategy,and control technology of the existing isolated bidirectional DC-DC converters.Subsequently,existing research results and problems with isolated bidirectional DC-DC converters are reviewed.Finally,future trends in the development of isolated bidirectional DC-DC converters for data centers are presented,which offer valuable insights for solving engineering obstacles and future research directions in the field.
基金the State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources under Grant LAPS21002the State Key Laboratory of Disaster Prevention and Reduction for Power Grid Transmission and Distribution Equipment under Grant SGHNFZ00FBYJJS2100047.
文摘To enhance the resilience of power systems with offshore wind farms(OWFs),a proactive scheduling scheme is proposed to unlock the flexibility of cloud data centers(CDCs)responding to uncertain spatial and temporal impacts induced by hurricanes.The total life simulation(TLS)is adopted to project the local weather conditions at transmission lines and OWFs,before,during,and after the hurricane.The static power curve of wind turbines(WTs)is used to capture the output of OWFs,and the fragility analysis of transmission-line components is used to formulate the time-varying failure rates of transmission lines.A novel distributionally robust ambiguity set is constructed with a discrete support set,where the impacts of hurricanes are depicted by these supports.To minimize load sheddings and dropping workloads,the spatial and temporal demand response capabilities of CDCs according to task migration and delay tolerance are incorporated into resilient management.The flexibilities of CDC’s power consumption are integrated into a two-stage distributionally robust optimization problem with conditional value at risk(CVaR).Based on Lagrange duality,this problem is reformulated into its deterministic counterpart and solved by a novel decomposition method with hybrid cuts,admitting fewer iterations and a faster convergence rate.The effectiveness of the proposed resilient management strategy is verified through case studies conducted on the modified IEEERTS 24 system,which includes 4 data centers and 5 offshore wind farms.
基金This study was supported by the Shenzhen Sustainable Development Science and Technology Project(2021N033)the National Key Research and Development Program(2016YFC0700404)the National Natural Science Foundation of China(51676145).
文摘This paper systematically describes the technical principles,evaluation indicators,system forms and research progress of air-side evaporative cooling air conditioning systems,water-side evaporative cooling air conditioning systems and freon-side evaporative cold coagulation heat air conditioning systems of Data center.In order to reduce the energy consumption of the refrigeration and air-conditioning system in the Data center,the applica-tion conditions and scenarios of the different forms of evaporative cooling air-conditioning systems should be considered comprehensively.Therefore,it is very important that the renewable energy-dry air can be used to the greatest extent.These efforts would contribute to China’s 2030"Carbon Peak"and 2060"Carbon Neutral."
基金supported by the National Natural Science Foundation of China (No.62102046,62072249,62072056)JinWang,YongjunRen,and Jinbin Hu receive the grant,and the URLs to the sponsors’websites are https://www.nsfc.gov.cn/.This work is also funded by the National Science Foundation of Hunan Province (No.2022JJ30618,2020JJ2029).
文摘In the Ethernet lossless Data Center Networks (DCNs) deployedwith Priority-based Flow Control (PFC), the head-of-line blocking problemis still difficult to prevent due to PFC triggering under burst trafficscenarios even with the existing congestion control solutions. To addressthe head-of-line blocking problem of PFC, we propose a new congestioncontrol mechanism. The key point of Congestion Control Using In-NetworkTelemetry for Lossless Datacenters (ICC) is to use In-Network Telemetry(INT) technology to obtain comprehensive congestion information, which isthen fed back to the sender to adjust the sending rate timely and accurately.It is possible to control congestion in time, converge to the target rate quickly,and maintain a near-zero queue length at the switch when using ICC. Weconducted Network Simulator-3 (NS-3) simulation experiments to test theICC’s performance. When compared to Congestion Control for Large-ScaleRDMA Deployments (DCQCN), TIMELY: RTT-based Congestion Controlfor the Datacenter (TIMELY), and Re-architecting Congestion Managementin Lossless Ethernet (PCN), ICC effectively reduces PFC pause messages andFlow Completion Time (FCT) by 47%, 56%, 34%, and 15.3×, 14.8×, and11.2×, respectively.
文摘The 6th generation mobile networks(6G)network is a kind of multi-network interconnection and multi-scenario coexistence network,where multiple network domains break the original fixed boundaries to form connections and convergence.In this paper,with the optimization objective of maximizing network utility while ensuring flows performance-centric weighted fairness,this paper designs a reinforcement learning-based cloud-edge autonomous multi-domain data center network architecture that achieves single-domain autonomy and multi-domain collaboration.Due to the conflict between the utility of different flows,the bandwidth fairness allocation problem for various types of flows is formulated by considering different defined reward functions.Regarding the tradeoff between fairness and utility,this paper deals with the corresponding reward functions for the cases where the flows undergo abrupt changes and smooth changes in the flows.In addition,to accommodate the Quality of Service(QoS)requirements for multiple types of flows,this paper proposes a multi-domain autonomous routing algorithm called LSTM+MADDPG.Introducing a Long Short-Term Memory(LSTM)layer in the actor and critic networks,more information about temporal continuity is added,further enhancing the adaptive ability changes in the dynamic network environment.The LSTM+MADDPG algorithm is compared with the latest reinforcement learning algorithm by conducting experiments on real network topology and traffic traces,and the experimental results show that LSTM+MADDPG improves the delay convergence speed by 14.6%and delays the start moment of packet loss by 18.2%compared with other algorithms.
基金The Deanship of Scientific Research at Hashemite University partially funds this workDeanship of Scientific Research at the Northern Border University,Arar,KSA for funding this research work through the project number“NBU-FFR-2024-1580-08”.
文摘Cloud Datacenter Network(CDN)providers usually have the option to scale their network structures to allow for far more resource capacities,though such scaling options may come with exponential costs that contradict their utility objectives.Yet,besides the cost of the physical assets and network resources,such scaling may also imposemore loads on the electricity power grids to feed the added nodes with the required energy to run and cool,which comes with extra costs too.Thus,those CDNproviders who utilize their resources better can certainly afford their services at lower price-units when compared to others who simply choose the scaling solutions.Resource utilization is a quite challenging process;indeed,clients of CDNs usually tend to exaggerate their true resource requirements when they lease their resources.Service providers are committed to their clients with Service Level Agreements(SLAs).Therefore,any amendment to the resource allocations needs to be approved by the clients first.In this work,we propose deploying a Stackelberg leadership framework to formulate a negotiation game between the cloud service providers and their client tenants.Through this,the providers seek to retrieve those leased unused resources from their clients.Cooperation is not expected from the clients,and they may ask high price units to return their extra resources to the provider’s premises.Hence,to motivate cooperation in such a non-cooperative game,as an extension to theVickery auctions,we developed an incentive-compatible pricingmodel for the returned resources.Moreover,we also proposed building a behavior belief function that shapes the way of negotiation and compensation for each client.Compared to other benchmark models,the assessment results showthat our proposed models provide for timely negotiation schemes,allowing for better resource utilization rates,higher utilities,and grid-friend CDNs.