Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due ...Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due to their extensive energy consumption during workload pro-cessing.Numerous research studies have examined distinct operating cost mitigation techniques for geo-distributed data centers(DCs).However,oper-ating cost savings during workload processing,which also considers string-matching techniques in geo-distributed DCs,remains unexplored.In this research,we propose a novel string matching-based geographical load balanc-ing(SMGLB)technique to mitigate the operating cost of the geo-distributed DC.The primary goal of this study is to use a string-matching algorithm(i.e.,Boyer Moore)to compare the contents of incoming workloads to those of documents that have already been processed in a data center.A successful match prevents the global load balancer from sending the user’s request to a data center for processing and displaying the results of the previously processed workload to the user to save energy.On the contrary,if no match can be discovered,the global load balancer will allocate the incoming workload to a specific DC for processing considering variable energy prices,the number of active servers,on-site green energy,and traces of incoming workload.The results of numerical evaluations show that the SMGLB can minimize the operating expenses of the geo-distributed data centers more than the existing workload distribution techniques.展开更多
Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have ...Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important goal.Existing data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time(NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost.展开更多
Cloud service providers generally co-locate online services and batch jobs onto the same computer cluster,where the resources can be pooled in order to maximize data center resource utilization.Due to resource competi...Cloud service providers generally co-locate online services and batch jobs onto the same computer cluster,where the resources can be pooled in order to maximize data center resource utilization.Due to resource competition between batch jobs and online services,co-location frequently impairs the performance of online services.This study presents a quality of service(QoS)prediction-based schedulingmodel(QPSM)for co-locatedworkloads.The performance prediction of QPSM consists of two parts:the prediction of an online service’s QoS anomaly based on XGBoost and the prediction of the completion time of an offline batch job based on randomforest.On-line service QoS anomaly prediction is used to evaluate the influence of batch jobmix on on-line service performance,and batch job completion time prediction is utilized to reduce the total waiting time of batch jobs.When the same number of batch jobs are scheduled in experiments using typical test sets such as CloudSuite,the scheduling time required by QPSM is reduced by about 6 h on average compared with the first-come,first-served strategy and by about 11 h compared with the random scheduling strategy.Compared with the non-co-located situation,QPSM can improve CPU resource utilization by 12.15% and memory resource utilization by 5.7% on average.Experiments show that the QPSM scheduling strategy proposed in this study can effectively guarantee the quality of online services and further improve cluster resource utilization.展开更多
文摘Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due to their extensive energy consumption during workload pro-cessing.Numerous research studies have examined distinct operating cost mitigation techniques for geo-distributed data centers(DCs).However,oper-ating cost savings during workload processing,which also considers string-matching techniques in geo-distributed DCs,remains unexplored.In this research,we propose a novel string matching-based geographical load balanc-ing(SMGLB)technique to mitigate the operating cost of the geo-distributed DC.The primary goal of this study is to use a string-matching algorithm(i.e.,Boyer Moore)to compare the contents of incoming workloads to those of documents that have already been processed in a data center.A successful match prevents the global load balancer from sending the user’s request to a data center for processing and displaying the results of the previously processed workload to the user to save energy.On the contrary,if no match can be discovered,the global load balancer will allocate the incoming workload to a specific DC for processing considering variable energy prices,the number of active servers,on-site green energy,and traces of incoming workload.The results of numerical evaluations show that the SMGLB can minimize the operating expenses of the geo-distributed data centers more than the existing workload distribution techniques.
基金supported by the National Natural Science Foundation of China (Nos. 61320106007, 61572129, 61502097, and 61370207)the National High-Tech Research and Development (863) Program of China (No. 2013AA013503)+4 种基金International S&T Cooperation Program of China (No. 2015DFA10490)Jiangsu research prospective joint research project (No. BY2013073-01)Jiangsu Provincial Key Laboratory of Network and Information Security (No. BM2003201)Key Laboratory of Computer Network and Information Integration of Ministry of Education of China (No. 93K-9)supported by Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology
文摘Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important goal.Existing data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time(NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost.
基金supported by the NationalNatural Science Foundation of China(No.61972118)the Key R&D Program of Zhejiang Province(No.2023C01028).
文摘Cloud service providers generally co-locate online services and batch jobs onto the same computer cluster,where the resources can be pooled in order to maximize data center resource utilization.Due to resource competition between batch jobs and online services,co-location frequently impairs the performance of online services.This study presents a quality of service(QoS)prediction-based schedulingmodel(QPSM)for co-locatedworkloads.The performance prediction of QPSM consists of two parts:the prediction of an online service’s QoS anomaly based on XGBoost and the prediction of the completion time of an offline batch job based on randomforest.On-line service QoS anomaly prediction is used to evaluate the influence of batch jobmix on on-line service performance,and batch job completion time prediction is utilized to reduce the total waiting time of batch jobs.When the same number of batch jobs are scheduled in experiments using typical test sets such as CloudSuite,the scheduling time required by QPSM is reduced by about 6 h on average compared with the first-come,first-served strategy and by about 11 h compared with the random scheduling strategy.Compared with the non-co-located situation,QPSM can improve CPU resource utilization by 12.15% and memory resource utilization by 5.7% on average.Experiments show that the QPSM scheduling strategy proposed in this study can effectively guarantee the quality of online services and further improve cluster resource utilization.