This paper describes the fundamentals of cloud computing and current big-data key technologies. We categorize big-da- ta processing as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-ba...This paper describes the fundamentals of cloud computing and current big-data key technologies. We categorize big-da- ta processing as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-based according to the processing technique. We highlight the strengths and weaknesses of various big-data cloud processing techniques in order to help the big-data community select the appropri- ate processing technique. We also provide big data research challenges and future directions in aspect to transportation management systems.展开更多
Load balancing is vital for the efficient and long-term operation of cloud data centers.With virtualization,post(reactive)migration of virtual machines(VMs)after allocation is the traditional way for load balancing an...Load balancing is vital for the efficient and long-term operation of cloud data centers.With virtualization,post(reactive)migration of virtual machines(VMs)after allocation is the traditional way for load balancing and consolidation.However,it is not easy for reactive migration to obtain predefined load balance objectives and it may interrupt services and bring instability.Therefore,we provide a new approach,called Prepartition,for load balancing.It partitions a VM request into a few sub-requests sequentially with start time,end time and capacity demands,and treats each sub-request as a regular VM request.In this way,it can proactively set a bound for each VM request on each physical machine and makes the scheduler get ready before VM migration to obtain the predefined load balancing goal,which supports the resource allocation in a fine-grained manner.Simulations with real-world trace and synthetic data show that our proposed approach with offline version(PrepartitionOff)scheduling has 10%–20%better performance than the existing load balancing baselines under several metrics,including average utilization,imbalance degree,makespan and Capacity_makespan.We also extend Prepartition to online load balancing.Evaluation results show that our proposed approach also outperforms state-of-the-art online algorithms.展开更多
With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for evaluati...With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for evaluating cloud performance on the whole system level. To the best of our knowledge, this paper proposes the first benchmark suite CloudRank-D to benchmark and rank cloud computing sys- tems that are shared for running big data applications. We an- alyze the limitations of previous metrics, e.g., floating point operations, for evaluating a cloud computing system, and propose two simple metrics: data processed per second and data processed per Joule as two complementary metrics for evaluating cloud computing systems. We detail the design of CloudRank-D that considers representative applications, di- versity of data characteristics, and dynamic behaviors of both applications and system software platforms. Through experi- ments, we demonstrate the advantages of our proposed met- tics. In several case studies, we evaluate two small-scale de- ployments of cloud computing systems using CloudRank-D.展开更多
Smart cities have given a significant impetus to manage traffic and use transport networks in an intelligent way. For the above reason, intelligent transportation systems (ITSs) and location-based services (LBSs) ...Smart cities have given a significant impetus to manage traffic and use transport networks in an intelligent way. For the above reason, intelligent transportation systems (ITSs) and location-based services (LBSs) have become an interesting research area over the last years. Due to the rapid increase of data volume within the transportation domain, cloud environment is of paramount importance for storing, accessing, handling, and processing such huge amounts of data. A large part of data within the transportation domain is produced in the form of Global Positioning System (GPS) data. Such a kind of data is usually infrequent and noisy and achieving the quality of real-time transport applications based on GPS is a difficult task. The map-matching process, which is responsible for the accurate alignment of observed GPS positions onto a road network, plays a pivotal role in many ITS applications. Regarding accuracy, the performance of a map-matching strategy is based on the shortest path between two consecutive observed GPS positions. On the other extreme, processing shortest path queries (SPQs) incurs high computational cost. Current map-matching techniques are approached with a fixed number of parameters, i.e., the number of candidate points (NCP) and error circle radius (ECR), which may lead to uncertainty when identifying road segments and either low-accurate results or a large number of SPQs. Moreover, due to the sampling error, GPS data with a high-sampling period (i.e., less than 10 s) typically contains extraneous datum, which also incurs an extra number of SPQs. Due to the high computation cost incurred by SPQs, current map-matching strategies are not suitable for real-time processing. In this paper, we propose real-time map-matching (called RT-MM), which is a fully adaptive map-matching strategy based on cloud to address the key challenge of SPQs in a map-matching process for real-time GPS trajectories. The evaluation of our approach against state-of-the-art approaches is performed through simulations based on both synthetic and real-word datasets.展开更多
Workload characterization is critical for resource management and scheduling.Recently,with the fast development of container technique,more and more cloud service providers like Google and Alibaba adopt containers to ...Workload characterization is critical for resource management and scheduling.Recently,with the fast development of container technique,more and more cloud service providers like Google and Alibaba adopt containers to provide cloud services,due to the low overheads.However,the characteristics of co-located diverse services(e.g.,interactive on-line services,off-line computing services)running in containers are still not clear.In this paper,we present a comprehensive analysis of the characteristics of co-located workloads running in containers on the same server from the perspective of hardware events.Our study quantifies and reveals the system behavior from the micro-architecture level when workloads are running in different co-location patterns.Through the analysis of typical hardware events,we provide recommended/unrecommended co-location workload patterns which provide valuable deployment suggestions for datacenter administrators.展开更多
Both performance and energy cost are impor- tant concerns for current data center operators. Traditionally, however, IT and mechanical engineers have separately op- timized the cyber and physical aspects of data cente...Both performance and energy cost are impor- tant concerns for current data center operators. Traditionally, however, IT and mechanical engineers have separately op- timized the cyber and physical aspects of data center operations. This paper considers both of these aspects with the eventual goal of developing performance and power management techniques that operate holistically to control the entire cyber-physical complex of data center installations. Toward this end, we propose a balance of payments model for holis- tic power and performance management. As an example of coordinated cyber-physical system management, the energy- aware cyber-physical system (EaCPS) uses an application controller on the cyber side to guarantee application perfor- mance, and on the physical side, it utilizes electric current- aware capacity management (CACM) to smartly place exe- cutables to reduce the energy consumption of each chassis present in a data center rack. A web application, representa- tive of a multi-tier web site, is used to evaluate the perfor- mance of the controller on the cyber side, the CACM control on the physical side, and the holistic EaCPS methods in a mid-size instrumented data center. Results indicate that coor- dinated EaCPS outperforms separate cyber and physical con- trol modules.展开更多
基金supported in part by the National Basic Research Program(973 Program,No.2015CB352400)NSFC under grant U1401258U.S NSF under grant CCF-1016966
文摘This paper describes the fundamentals of cloud computing and current big-data key technologies. We categorize big-da- ta processing as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-based according to the processing technique. We highlight the strengths and weaknesses of various big-data cloud processing techniques in order to help the big-data community select the appropri- ate processing technique. We also provide big data research challenges and future directions in aspect to transportation management systems.
基金supported by Shenzhen Industrial Application Projects of undertaking the National Key Research and Development Program of China under Grant No.CJGJZD20210408091600002the National Natural Science Foundation of China under Grant No.62102408Shenzhen Science and Technology Program under Grant No.RCBS20210609104609044.
文摘Load balancing is vital for the efficient and long-term operation of cloud data centers.With virtualization,post(reactive)migration of virtual machines(VMs)after allocation is the traditional way for load balancing and consolidation.However,it is not easy for reactive migration to obtain predefined load balance objectives and it may interrupt services and bring instability.Therefore,we provide a new approach,called Prepartition,for load balancing.It partitions a VM request into a few sub-requests sequentially with start time,end time and capacity demands,and treats each sub-request as a regular VM request.In this way,it can proactively set a bound for each VM request on each physical machine and makes the scheduler get ready before VM migration to obtain the predefined load balancing goal,which supports the resource allocation in a fine-grained manner.Simulations with real-world trace and synthetic data show that our proposed approach with offline version(PrepartitionOff)scheduling has 10%–20%better performance than the existing load balancing baselines under several metrics,including average utilization,imbalance degree,makespan and Capacity_makespan.We also extend Prepartition to online load balancing.Evaluation results show that our proposed approach also outperforms state-of-the-art online algorithms.
文摘With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for evaluating cloud performance on the whole system level. To the best of our knowledge, this paper proposes the first benchmark suite CloudRank-D to benchmark and rank cloud computing sys- tems that are shared for running big data applications. We an- alyze the limitations of previous metrics, e.g., floating point operations, for evaluating a cloud computing system, and propose two simple metrics: data processed per second and data processed per Joule as two complementary metrics for evaluating cloud computing systems. We detail the design of CloudRank-D that considers representative applications, di- versity of data characteristics, and dynamic behaviors of both applications and system software platforms. Through experi- ments, we demonstrate the advantages of our proposed met- tics. In several case studies, we evaluate two small-scale de- ployments of cloud computing systems using CloudRank-D.
基金Project supported by the National Basic Research Program (973) of China (No. 2015CB352400), the National Natural Science Foundation of China (Nos. 61100220 and U1401258), and the US National Science Foundation (No. CCF- 1016966)
文摘Smart cities have given a significant impetus to manage traffic and use transport networks in an intelligent way. For the above reason, intelligent transportation systems (ITSs) and location-based services (LBSs) have become an interesting research area over the last years. Due to the rapid increase of data volume within the transportation domain, cloud environment is of paramount importance for storing, accessing, handling, and processing such huge amounts of data. A large part of data within the transportation domain is produced in the form of Global Positioning System (GPS) data. Such a kind of data is usually infrequent and noisy and achieving the quality of real-time transport applications based on GPS is a difficult task. The map-matching process, which is responsible for the accurate alignment of observed GPS positions onto a road network, plays a pivotal role in many ITS applications. Regarding accuracy, the performance of a map-matching strategy is based on the shortest path between two consecutive observed GPS positions. On the other extreme, processing shortest path queries (SPQs) incurs high computational cost. Current map-matching techniques are approached with a fixed number of parameters, i.e., the number of candidate points (NCP) and error circle radius (ECR), which may lead to uncertainty when identifying road segments and either low-accurate results or a large number of SPQs. Moreover, due to the sampling error, GPS data with a high-sampling period (i.e., less than 10 s) typically contains extraneous datum, which also incurs an extra number of SPQs. Due to the high computation cost incurred by SPQs, current map-matching strategies are not suitable for real-time processing. In this paper, we propose real-time map-matching (called RT-MM), which is a fully adaptive map-matching strategy based on cloud to address the key challenge of SPQs in a map-matching process for real-time GPS trajectories. The evaluation of our approach against state-of-the-art approaches is performed through simulations based on both synthetic and real-word datasets.
基金This work is supported by the National Key Research and Development Program of China under Grant No.2018YFB1004804the National Natural Science Foundation of China under Grant No.61702492the Shenzhen Basic Research Program under Grant Nos.JCYJ20170818153016513 and JCYJ20170307164747920,and Alibaba Innovative Research(AIR)Project.
文摘Workload characterization is critical for resource management and scheduling.Recently,with the fast development of container technique,more and more cloud service providers like Google and Alibaba adopt containers to provide cloud services,due to the low overheads.However,the characteristics of co-located diverse services(e.g.,interactive on-line services,off-line computing services)running in containers are still not clear.In this paper,we present a comprehensive analysis of the characteristics of co-located workloads running in containers on the same server from the perspective of hardware events.Our study quantifies and reveals the system behavior from the micro-architecture level when workloads are running in different co-location patterns.Through the analysis of typical hardware events,we provide recommended/unrecommended co-location workload patterns which provide valuable deployment suggestions for datacenter administrators.
文摘Both performance and energy cost are impor- tant concerns for current data center operators. Traditionally, however, IT and mechanical engineers have separately op- timized the cyber and physical aspects of data center operations. This paper considers both of these aspects with the eventual goal of developing performance and power management techniques that operate holistically to control the entire cyber-physical complex of data center installations. Toward this end, we propose a balance of payments model for holis- tic power and performance management. As an example of coordinated cyber-physical system management, the energy- aware cyber-physical system (EaCPS) uses an application controller on the cyber side to guarantee application perfor- mance, and on the physical side, it utilizes electric current- aware capacity management (CACM) to smartly place exe- cutables to reduce the energy consumption of each chassis present in a data center rack. A web application, representa- tive of a multi-tier web site, is used to evaluate the perfor- mance of the controller on the cyber side, the CACM control on the physical side, and the holistic EaCPS methods in a mid-size instrumented data center. Results indicate that coor- dinated EaCPS outperforms separate cyber and physical con- trol modules.