Cloud computing is considered to facilitate a more cost-effective way to deploy scientific workflows.The individual tasks of a scientific work-flow necessitate a diversified number of large states that are spatially l...Cloud computing is considered to facilitate a more cost-effective way to deploy scientific workflows.The individual tasks of a scientific work-flow necessitate a diversified number of large states that are spatially located in different datacenters,thereby resulting in huge delays during data transmis-sion.Edge computing minimizes the delays in data transmission and supports the fixed storage strategy for scientific workflow private datasets.Therefore,this fixed storage strategy creates huge amount of bottleneck in its storage capacity.At this juncture,integrating the merits of cloud computing and edge computing during the process of rationalizing the data placement of scientific workflows and optimizing the energy and time incurred in data transmission across different datacentres remains a challenge.In this paper,Adaptive Cooperative Foraging and Dispersed Foraging Strategies-Improved Harris Hawks Optimization Algorithm(ACF-DFS-HHOA)is proposed for optimizing the energy and data transmission time in the event of placing data for a specific scientific workflow.This ACF-DFS-HHOA considered the factors influencing transmission delay and energy consumption of data centers into account during the process of rationalizing the data placement of scientific workflows.The adaptive cooperative and dispersed foraging strategy is included in HHOA to guide the position updates that improve population diversity and effectively prevent the algorithm from being trapped into local optimality points.The experimental results of ACF-DFS-HHOA confirmed its predominance in minimizing energy and data transmission time incurred during workflow execution.展开更多
When workflow task needs several datasets from different locations m cloud, data transfer becomes a challenge. To avoid the unnecessary data transfer, a graphical-based data placement algo- rithm for cloud workflow is...When workflow task needs several datasets from different locations m cloud, data transfer becomes a challenge. To avoid the unnecessary data transfer, a graphical-based data placement algo- rithm for cloud workflow is proposed. The algorithm uses affinity graph to group datasets while keeping a polynomial time complexity. By integrating the algorithm, the workflow engine can intelligently select locations in which the data will reside to avoid the unnecessary data transfer during the initial stage and runtime stage. Simulations show that the proposed algorithm can effectively reduce data transfer during the workflow' s execution.展开更多
Social networks(SNs)are sources with extreme number of users around the world who are all sharing data like images,audio,and video to their friends using IoT devices.This concept is the so-called Social Internet of Th...Social networks(SNs)are sources with extreme number of users around the world who are all sharing data like images,audio,and video to their friends using IoT devices.This concept is the so-called Social Internet of Things(SIot).The evolving nature of edge-cloud computing has enabled storage of a large volume of data from various sources,and this task demands an efficient storage procedure.For this kind of large volume of data storage,the usage of data replication using edge with geo-distributed cloud service area is suited to fulfill the user’s expectations with low latency.The major issue is the way to store the data and replicate these large data items optimally and allocate the request from the data center efficiently.For efficient storage of these data,we use edge server,which is part of the cloud server,in this study.Thus,the data are distributed and stored with quick access,which will reduce the latency with response.The proposed data placement approach learns with machine learning(ML)algorithm called radial basis kernel function assisted with support vector machine(RBF-SVM)to classify the data center for storing the user and friend’s data from the SIoT devices.These learning algorithms will be used to predict the workload of the data stored in the data center as either edge or cloud depending on the existing time slots.The data placement with dynamic nature is also optimized using the proposed dynamic graph partitioning(GP)method to meet the individual user’s demand of low latency with minimum costs.This way will keep the SIoT data placement efficient and effective over time.Accordingly,this proposed data placement and replication approach introduces three kinds of innovations compared with the existing data placement approach.(i)Rather than storing the user data in a single cloud,this study uses the edge server closest to the SIoT devices for faster access with reduced response time.(ii)The classification algorithm called RBF-SVM is used to find storage for user for reducing data replication.(iii)Dynamic GP is introduced for data placement with reduced latency and minimum cost to fulfil the dynamic nature of the SN.The simulation result of this approach obtains reduced latency of 130 ms and minimum cost compared with those of the existing data placement approaches.Therefore,our proposed data placement with ML-based learning on edge provides promising results in terms of efficiency,effectiveness,and performance with reduced latency and minimum cost.展开更多
Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have ...Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important goal.Existing data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time(NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost.展开更多
By moving computations from computing nodes to storage nodes, active storage technology provides an efficient for data-intensive high-performance computing applications. The existing studies have neglected the heterog...By moving computations from computing nodes to storage nodes, active storage technology provides an efficient for data-intensive high-performance computing applications. The existing studies have neglected the heterogeneity of storage nodes on the performance of active storage systems. We introduce CADP, a capability-aware data placement scheme for heterogeneous active storage systems to obtain high-performance data processing. The basic idea of CADP is to place data on storage nodes based on their computing capability and storage capability, so that the load-imbalance among heterogeneous servers can be avoided. We have implemented CADP under a parallel I/O system. The experimental results show that the proposed capability-aware data placement scheme can improve the active storage system performance significantly.展开更多
The 3-replica redundancy strategy is widely used to solve the problem of data reliability in large-scale distributed storage systems. However, its storage capacity utilization is only 33%. In this paper, a data placem...The 3-replica redundancy strategy is widely used to solve the problem of data reliability in large-scale distributed storage systems. However, its storage capacity utilization is only 33%. In this paper, a data placement algorithm based on fault-tolerant domain (FTD) is proposed. Owing to the fine-grained design of the FTD, the data reliability of systems using two replicas is comparable to that of current mainstream systems using three replicas, and the capacity utilization is increased to 50%. Moreover, the proposed FTD provides a new concept for the design of distributed storage systems. Distributed storage systems can take FTDs as the units for data placement, data migration, data repair and so on. In addition, fault detection can be performed independently and concurrently within the FTDs.展开更多
We present a novel paradigm of sensor placement concerning data precision and estimation.Multiple abstract sensors are used to measure a quantity of a moving target in the scenario of a wireless sensor network.These s...We present a novel paradigm of sensor placement concerning data precision and estimation.Multiple abstract sensors are used to measure a quantity of a moving target in the scenario of a wireless sensor network.These sensors can cooperate with each other to obtain a precise estimate of the quantity in a real-time manner.We consider a problem on planning a minimum-cost scheme of sensor placement with desired data precision and resource consumption.Measured data is modeled as a Gaussian random variable with a changeable variance.A gird model is used to approximate the problem.We solve the problem with a heuristic algorithm using branch-and-bound method and tabu search.Our experiments demonstrate that the algorithm is correct in a certain tolerance,and it is also efficient and scalable.展开更多
With the development of Computerized Business Application, the amount of data is increasing exponentially. Cloud computing provides high performance computing resources and mass storage resources for massive data proc...With the development of Computerized Business Application, the amount of data is increasing exponentially. Cloud computing provides high performance computing resources and mass storage resources for massive data processing. In distributed cloud computing systems, data intensive computing can lead to data scheduling between data centers. Reasonable data placement can reduce data scheduling between the data centers effectively, and improve the data acquisition efficiency of users. In this paper, the mathematical model of data scheduling between data centers is built. By means of the global optimization ability of the genetic algorithm, generational evolution produces better approximate solution, and gets the best approximation of the data placement at last. The experimental results show that genetic algorithm can effectively work out the approximate optimal data placement, and minimize data scheduling between data centers.展开更多
The recently proposed data-driven pole placement method is able to make use of measurement data to simultaneously identify a state space model and derive pole placement state feedback gain. It can achieve this precise...The recently proposed data-driven pole placement method is able to make use of measurement data to simultaneously identify a state space model and derive pole placement state feedback gain. It can achieve this precisely for systems that are linear time-invariant and for which noiseless measurement datasets are available. However, for nonlinear systems, and/or when the only noisy measurement datasets available contain noise, this approach is unable to yield satisfactory results. In this study, we investigated the effect on data-driven pole placement performance of introducing a prefilter to reduce the noise present in datasets. Using numerical simulations of a self-balancing robot, we demonstrated the important role that prefiltering can play in reducing the interference caused by noise.展开更多
为解决混合云环境下科学工作流数据布局问题,在考虑数据的安全需求的前提下,以优化跨数据中心传输时延为目标,提出了一种混合云环境下面向安全的科学工作流布局策略。分析数据集的安全需求以及数据中心所能提供的安全服务,提出安全等级...为解决混合云环境下科学工作流数据布局问题,在考虑数据的安全需求的前提下,以优化跨数据中心传输时延为目标,提出了一种混合云环境下面向安全的科学工作流布局策略。分析数据集的安全需求以及数据中心所能提供的安全服务,提出安全等级分级规则;设计并提出基于遗传算法和模拟退火算法的自适应粒子群优化算法(adaptive particle swarm optimization algorithm based on SA and GA,SAGA-PSO),避免算法陷入局部极值,有效提高种群多样性;与其它经典布局算法对比,基于SAGA-PSO的数据布局策略在满足数据安全需求的同时能够大大降低传输时延。展开更多
Controller placement problem(CPP)is a critical issue in software defined wireless networks(SDWN).Due to the limited power of wireless devices,CPP is facing the challenge of energy efficiency in SDWN.Nevertheless,the r...Controller placement problem(CPP)is a critical issue in software defined wireless networks(SDWN).Due to the limited power of wireless devices,CPP is facing the challenge of energy efficiency in SDWN.Nevertheless,the related research on CPP in SDWN hasn’t modeled the energy consumption of controllers so far.To prolong the lifetime of SDWN and improve the practicability of research,we rebuilt a CPP model considering the minimal transmitted power of controllers.An adaptive controller placement algorithm(ACPA)is proposed with the following two stages.First,data field method is adopted to determine sub-networks for different network topologies.Second,for each sub-network we adopt an exhaustive method to find the optimal location which meets the minimal average transmitted power to place controller.Compared with the other algorithms,the effectiveness and efficiency of the proposed scheme are validated through simulation.展开更多
This paper presents a study for finding a solution to the placement of territorial resources for multipurpose wireless services considering also the restrictions imposed by the orography of the territory itself. To so...This paper presents a study for finding a solution to the placement of territorial resources for multipurpose wireless services considering also the restrictions imposed by the orography of the territory itself. To solve this problem genetic algorithms are used to identify sites where to place the resources for the optimal coverage of a given area. The used algorithm has demonstrated to be able to find optimal solutions in a variety of considered situations.展开更多
文摘Cloud computing is considered to facilitate a more cost-effective way to deploy scientific workflows.The individual tasks of a scientific work-flow necessitate a diversified number of large states that are spatially located in different datacenters,thereby resulting in huge delays during data transmis-sion.Edge computing minimizes the delays in data transmission and supports the fixed storage strategy for scientific workflow private datasets.Therefore,this fixed storage strategy creates huge amount of bottleneck in its storage capacity.At this juncture,integrating the merits of cloud computing and edge computing during the process of rationalizing the data placement of scientific workflows and optimizing the energy and time incurred in data transmission across different datacentres remains a challenge.In this paper,Adaptive Cooperative Foraging and Dispersed Foraging Strategies-Improved Harris Hawks Optimization Algorithm(ACF-DFS-HHOA)is proposed for optimizing the energy and data transmission time in the event of placing data for a specific scientific workflow.This ACF-DFS-HHOA considered the factors influencing transmission delay and energy consumption of data centers into account during the process of rationalizing the data placement of scientific workflows.The adaptive cooperative and dispersed foraging strategy is included in HHOA to guide the position updates that improve population diversity and effectively prevent the algorithm from being trapped into local optimality points.The experimental results of ACF-DFS-HHOA confirmed its predominance in minimizing energy and data transmission time incurred during workflow execution.
基金Supported by the National Natural Science Foundation of China(No.60903137,60970132)
文摘When workflow task needs several datasets from different locations m cloud, data transfer becomes a challenge. To avoid the unnecessary data transfer, a graphical-based data placement algo- rithm for cloud workflow is proposed. The algorithm uses affinity graph to group datasets while keeping a polynomial time complexity. By integrating the algorithm, the workflow engine can intelligently select locations in which the data will reside to avoid the unnecessary data transfer during the initial stage and runtime stage. Simulations show that the proposed algorithm can effectively reduce data transfer during the workflow' s execution.
文摘Social networks(SNs)are sources with extreme number of users around the world who are all sharing data like images,audio,and video to their friends using IoT devices.This concept is the so-called Social Internet of Things(SIot).The evolving nature of edge-cloud computing has enabled storage of a large volume of data from various sources,and this task demands an efficient storage procedure.For this kind of large volume of data storage,the usage of data replication using edge with geo-distributed cloud service area is suited to fulfill the user’s expectations with low latency.The major issue is the way to store the data and replicate these large data items optimally and allocate the request from the data center efficiently.For efficient storage of these data,we use edge server,which is part of the cloud server,in this study.Thus,the data are distributed and stored with quick access,which will reduce the latency with response.The proposed data placement approach learns with machine learning(ML)algorithm called radial basis kernel function assisted with support vector machine(RBF-SVM)to classify the data center for storing the user and friend’s data from the SIoT devices.These learning algorithms will be used to predict the workload of the data stored in the data center as either edge or cloud depending on the existing time slots.The data placement with dynamic nature is also optimized using the proposed dynamic graph partitioning(GP)method to meet the individual user’s demand of low latency with minimum costs.This way will keep the SIoT data placement efficient and effective over time.Accordingly,this proposed data placement and replication approach introduces three kinds of innovations compared with the existing data placement approach.(i)Rather than storing the user data in a single cloud,this study uses the edge server closest to the SIoT devices for faster access with reduced response time.(ii)The classification algorithm called RBF-SVM is used to find storage for user for reducing data replication.(iii)Dynamic GP is introduced for data placement with reduced latency and minimum cost to fulfil the dynamic nature of the SN.The simulation result of this approach obtains reduced latency of 130 ms and minimum cost compared with those of the existing data placement approaches.Therefore,our proposed data placement with ML-based learning on edge provides promising results in terms of efficiency,effectiveness,and performance with reduced latency and minimum cost.
基金supported by the National Natural Science Foundation of China (Nos. 61320106007, 61572129, 61502097, and 61370207)the National High-Tech Research and Development (863) Program of China (No. 2013AA013503)+4 种基金International S&T Cooperation Program of China (No. 2015DFA10490)Jiangsu research prospective joint research project (No. BY2013073-01)Jiangsu Provincial Key Laboratory of Network and Information Security (No. BM2003201)Key Laboratory of Computer Network and Information Integration of Ministry of Education of China (No. 93K-9)supported by Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology
文摘Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important goal.Existing data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time(NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost.
基金Supported by the National Science and Technology Foundation of China(61572377)the Natural Science Foundation of Hubei Province(2014CFB239)+2 种基金the Open Fund from HPCL(201512-02)the Open Fund from SKLSE(2015-A-06)the US National Science Foundation(CNS-1162540)
文摘By moving computations from computing nodes to storage nodes, active storage technology provides an efficient for data-intensive high-performance computing applications. The existing studies have neglected the heterogeneity of storage nodes on the performance of active storage systems. We introduce CADP, a capability-aware data placement scheme for heterogeneous active storage systems to obtain high-performance data processing. The basic idea of CADP is to place data on storage nodes based on their computing capability and storage capability, so that the load-imbalance among heterogeneous servers can be avoided. We have implemented CADP under a parallel I/O system. The experimental results show that the proposed capability-aware data placement scheme can improve the active storage system performance significantly.
基金the Science and Technology Project of Minhang District in Shanghai (No. 2018MH331)。
文摘The 3-replica redundancy strategy is widely used to solve the problem of data reliability in large-scale distributed storage systems. However, its storage capacity utilization is only 33%. In this paper, a data placement algorithm based on fault-tolerant domain (FTD) is proposed. Owing to the fine-grained design of the FTD, the data reliability of systems using two replicas is comparable to that of current mainstream systems using three replicas, and the capacity utilization is increased to 50%. Moreover, the proposed FTD provides a new concept for the design of distributed storage systems. Distributed storage systems can take FTDs as the units for data placement, data migration, data repair and so on. In addition, fault detection can be performed independently and concurrently within the FTDs.
基金Supported of Project of Fok Ying Tong Education Foundation(No.104030)Supported of Key Project of National Natural Science of Foundation of China(No.70531020)+2 种基金Supported of Project of New Century Excellent Talent(No.NCET-06-0382)Supported of Key Project of Education Ministry of China(No.306023)Supported of Project of Doctoral Education(20070247075)
文摘We present a novel paradigm of sensor placement concerning data precision and estimation.Multiple abstract sensors are used to measure a quantity of a moving target in the scenario of a wireless sensor network.These sensors can cooperate with each other to obtain a precise estimate of the quantity in a real-time manner.We consider a problem on planning a minimum-cost scheme of sensor placement with desired data precision and resource consumption.Measured data is modeled as a Gaussian random variable with a changeable variance.A gird model is used to approximate the problem.We solve the problem with a heuristic algorithm using branch-and-bound method and tabu search.Our experiments demonstrate that the algorithm is correct in a certain tolerance,and it is also efficient and scalable.
文摘With the development of Computerized Business Application, the amount of data is increasing exponentially. Cloud computing provides high performance computing resources and mass storage resources for massive data processing. In distributed cloud computing systems, data intensive computing can lead to data scheduling between data centers. Reasonable data placement can reduce data scheduling between the data centers effectively, and improve the data acquisition efficiency of users. In this paper, the mathematical model of data scheduling between data centers is built. By means of the global optimization ability of the genetic algorithm, generational evolution produces better approximate solution, and gets the best approximation of the data placement at last. The experimental results show that genetic algorithm can effectively work out the approximate optimal data placement, and minimize data scheduling between data centers.
文摘The recently proposed data-driven pole placement method is able to make use of measurement data to simultaneously identify a state space model and derive pole placement state feedback gain. It can achieve this precisely for systems that are linear time-invariant and for which noiseless measurement datasets are available. However, for nonlinear systems, and/or when the only noisy measurement datasets available contain noise, this approach is unable to yield satisfactory results. In this study, we investigated the effect on data-driven pole placement performance of introducing a prefilter to reduce the noise present in datasets. Using numerical simulations of a self-balancing robot, we demonstrated the important role that prefiltering can play in reducing the interference caused by noise.
文摘为解决混合云环境下科学工作流数据布局问题,在考虑数据的安全需求的前提下,以优化跨数据中心传输时延为目标,提出了一种混合云环境下面向安全的科学工作流布局策略。分析数据集的安全需求以及数据中心所能提供的安全服务,提出安全等级分级规则;设计并提出基于遗传算法和模拟退火算法的自适应粒子群优化算法(adaptive particle swarm optimization algorithm based on SA and GA,SAGA-PSO),避免算法陷入局部极值,有效提高种群多样性;与其它经典布局算法对比,基于SAGA-PSO的数据布局策略在满足数据安全需求的同时能够大大降低传输时延。
基金supported by the Open Research Fund of Key Laboratory of Space Utilization,Chinese Academy of Sciences(No.LSU-KFJJ-2018-06)the International Research Cooperation Seed Fund of Beijing University of Technology(No.2018B41)
文摘Controller placement problem(CPP)is a critical issue in software defined wireless networks(SDWN).Due to the limited power of wireless devices,CPP is facing the challenge of energy efficiency in SDWN.Nevertheless,the related research on CPP in SDWN hasn’t modeled the energy consumption of controllers so far.To prolong the lifetime of SDWN and improve the practicability of research,we rebuilt a CPP model considering the minimal transmitted power of controllers.An adaptive controller placement algorithm(ACPA)is proposed with the following two stages.First,data field method is adopted to determine sub-networks for different network topologies.Second,for each sub-network we adopt an exhaustive method to find the optimal location which meets the minimal average transmitted power to place controller.Compared with the other algorithms,the effectiveness and efficiency of the proposed scheme are validated through simulation.
文摘This paper presents a study for finding a solution to the placement of territorial resources for multipurpose wireless services considering also the restrictions imposed by the orography of the territory itself. To solve this problem genetic algorithms are used to identify sites where to place the resources for the optimal coverage of a given area. The used algorithm has demonstrated to be able to find optimal solutions in a variety of considered situations.