Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a no...Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.展开更多
Cloud computing environment is getting more interesting as a new trend of data management.Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud.However,due to th...Cloud computing environment is getting more interesting as a new trend of data management.Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud.However,due to the finite storage capacity of each site,copies that are useful for future jobs can be wastefully deleted and replaced with less valuable ones.Therefore,it is considerable to have appropriate replication strategy that can dynamically store the replicas while satisfying quality of service(QoS)requirements and storage capacity constraints.In this paper,we present a dynamic replication algorithm,named hierarchical data replication strategy(HDRS).HDRS consists of the replica creation that can adaptively increase replicas based on exponential growth or decay rate,the replica placement according to the access load and labeling technique,and finally the replica replacement based on the value of file in the future.We evaluate different dynamic data replication methods using CloudSim simulation.Experiments demonstrate that HDRS can reduce response time and bandwidth usage compared with other algorithms.It means that the HDRS can determine a popular file and replicates it to the best site.This method avoids useless replications and decreases access latency by balancing the load of sites.展开更多
As the amount of data continues to grow rapidly,the variety of data produced by applications is becoming more affluent than ever.Cloud computing is the best technology evolving today to provide multi-services for the ...As the amount of data continues to grow rapidly,the variety of data produced by applications is becoming more affluent than ever.Cloud computing is the best technology evolving today to provide multi-services for the mass and variety of data.The cloud computing features are capable of processing,managing,and storing all sorts of data.Although data is stored in many high-end nodes,either in the same data centers or across many data centers in cloud,performance issues are still inevitable.The cloud replication strategy is one of best solutions to address risk of performance degradation in the cloud environment.The real challenge here is developing the right data replication strategy with minimal data movement that guarantees efficient network usage,low fault tolerance,and minimal replication frequency.The key problem discussed in this research is inefficient network usage discovered during selecting a suitable data center to store replica copies induced by inadequate data center selection criteria.Hence,to mitigate the issue,we proposed Replication Strategy with a comprehensive Data Center Selection Method(RS-DCSM),which can determine the appropriate data center to place replicas by considering three key factors:Popularity,space availability,and centrality.The proposed RS-DCSM was simulated using CloudSim and the results proved that data movement between data centers is significantly reduced by 14%reduction in overall replication frequency and 20%decrement in network usage,which outperformed the current replication strategy,known as Dynamic Popularity aware Replication Strategy(DPRS)algorithm.展开更多
Most social networks allow connections amongst many people based on shared interests.Social networks have to offer shared data like videos,photos with minimum latency to the group,which could be challenging as the sto...Most social networks allow connections amongst many people based on shared interests.Social networks have to offer shared data like videos,photos with minimum latency to the group,which could be challenging as the storage cost has to be minimized and hence entire data replication is not a solution.The replication of data across a network of read-intensive can potentially lead to increased savings in cost and energy and reduce the end-user’s response time.Though simple and adaptive replication strategies exist,the solution is non-deter-ministic;the replicas of the data need to be optimized to the data usability,perfor-mance,and stability of the application systems.To resolve the non-deterministic issue of replication,metaheuristics are applied.In this work,Harmony Search and Tabu Search algorithms are used optimizing the replication process.A novel Har-mony-Tabu search is proposed for effective placement and replication of data.Experiments on large datasets show the effectiveness of the proposed technique.It is seen that the bandwidth saving for proposed harmony-Tabu replication per-forms better in the range of 3.57%to 18.18%for varying number of cloud data-centers when compared to simple replication,Tabu replication and Harmony replication algorithm.展开更多
Social networks(SNs)are sources with extreme number of users around the world who are all sharing data like images,audio,and video to their friends using IoT devices.This concept is the so-called Social Internet of Th...Social networks(SNs)are sources with extreme number of users around the world who are all sharing data like images,audio,and video to their friends using IoT devices.This concept is the so-called Social Internet of Things(SIot).The evolving nature of edge-cloud computing has enabled storage of a large volume of data from various sources,and this task demands an efficient storage procedure.For this kind of large volume of data storage,the usage of data replication using edge with geo-distributed cloud service area is suited to fulfill the user’s expectations with low latency.The major issue is the way to store the data and replicate these large data items optimally and allocate the request from the data center efficiently.For efficient storage of these data,we use edge server,which is part of the cloud server,in this study.Thus,the data are distributed and stored with quick access,which will reduce the latency with response.The proposed data placement approach learns with machine learning(ML)algorithm called radial basis kernel function assisted with support vector machine(RBF-SVM)to classify the data center for storing the user and friend’s data from the SIoT devices.These learning algorithms will be used to predict the workload of the data stored in the data center as either edge or cloud depending on the existing time slots.The data placement with dynamic nature is also optimized using the proposed dynamic graph partitioning(GP)method to meet the individual user’s demand of low latency with minimum costs.This way will keep the SIoT data placement efficient and effective over time.Accordingly,this proposed data placement and replication approach introduces three kinds of innovations compared with the existing data placement approach.(i)Rather than storing the user data in a single cloud,this study uses the edge server closest to the SIoT devices for faster access with reduced response time.(ii)The classification algorithm called RBF-SVM is used to find storage for user for reducing data replication.(iii)Dynamic GP is introduced for data placement with reduced latency and minimum cost to fulfil the dynamic nature of the SN.The simulation result of this approach obtains reduced latency of 130 ms and minimum cost compared with those of the existing data placement approaches.Therefore,our proposed data placement with ML-based learning on edge provides promising results in terms of efficiency,effectiveness,and performance with reduced latency and minimum cost.展开更多
This paper proposes a new primary lazy update protocol, PTCS (Primary Transaction Commit Schedule). In the PTCS protocol, a serializable primary transaction schedule is generated firstly and then the secondary trans...This paper proposes a new primary lazy update protocol, PTCS (Primary Transaction Commit Schedule). In the PTCS protocol, a serializable primary transaction schedule is generated firstly and then the secondary transactions are committed according to the serializable primary transaction schedule. PTCS protocol can guarantee serializability if the data copy graph contains no directed circles. It can also be ex tended to eliminate all requirements on the data copy graph. Compared to earlier works, PTCS protocol not only imposes a much weaker requirement on the data placement, but also avoids the deadlock caused by transaction waits and extra message overhead. The performance experiments show that the degradation of the performance caused by the replica man- agement of the PTCS protocol is tolerable.展开更多
In this paper, empirical Bayes test for a parameter θ of two-parameter exponential distribution is investigated with replicated past data. Under some conditions, the asymptotically optimal property is obtained. It is...In this paper, empirical Bayes test for a parameter θ of two-parameter exponential distribution is investigated with replicated past data. Under some conditions, the asymptotically optimal property is obtained. It is indicated that the rate of convergence can be very close to O(N-2^-1) in this case that a parameter μ is known.展开更多
In recent years, with the rapid development of data intensive applications, data replication has become an enabling technology for the data grid to improve data availability, and reduce file transfer time and bandwidt...In recent years, with the rapid development of data intensive applications, data replication has become an enabling technology for the data grid to improve data availability, and reduce file transfer time and bandwidth consumption. The placement of replicas has been proven to be the most difficult problem that must be solved to realize the process of data replication. This paper addresses the quality of service (QoS) aware replica placement problem in data grid, and proposes a dynamic programming based replica placement algorithm that not only has a QoS requirement guarantee, but also can minimize the overall replication cost, including storage cost and communication cost. By simulation, experiments show that the replica placement algorithm outperforms an existing popular replica placement technique in data grid.展开更多
文摘Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.
文摘Cloud computing environment is getting more interesting as a new trend of data management.Data replication has been widely applied to improve data access in distributed systems such as Grid and Cloud.However,due to the finite storage capacity of each site,copies that are useful for future jobs can be wastefully deleted and replaced with less valuable ones.Therefore,it is considerable to have appropriate replication strategy that can dynamically store the replicas while satisfying quality of service(QoS)requirements and storage capacity constraints.In this paper,we present a dynamic replication algorithm,named hierarchical data replication strategy(HDRS).HDRS consists of the replica creation that can adaptively increase replicas based on exponential growth or decay rate,the replica placement according to the access load and labeling technique,and finally the replica replacement based on the value of file in the future.We evaluate different dynamic data replication methods using CloudSim simulation.Experiments demonstrate that HDRS can reduce response time and bandwidth usage compared with other algorithms.It means that the HDRS can determine a popular file and replicates it to the best site.This method avoids useless replications and decreases access latency by balancing the load of sites.
基金supported by Universiti Putra Malaysia and the Ministry of Education(MOE).
文摘As the amount of data continues to grow rapidly,the variety of data produced by applications is becoming more affluent than ever.Cloud computing is the best technology evolving today to provide multi-services for the mass and variety of data.The cloud computing features are capable of processing,managing,and storing all sorts of data.Although data is stored in many high-end nodes,either in the same data centers or across many data centers in cloud,performance issues are still inevitable.The cloud replication strategy is one of best solutions to address risk of performance degradation in the cloud environment.The real challenge here is developing the right data replication strategy with minimal data movement that guarantees efficient network usage,low fault tolerance,and minimal replication frequency.The key problem discussed in this research is inefficient network usage discovered during selecting a suitable data center to store replica copies induced by inadequate data center selection criteria.Hence,to mitigate the issue,we proposed Replication Strategy with a comprehensive Data Center Selection Method(RS-DCSM),which can determine the appropriate data center to place replicas by considering three key factors:Popularity,space availability,and centrality.The proposed RS-DCSM was simulated using CloudSim and the results proved that data movement between data centers is significantly reduced by 14%reduction in overall replication frequency and 20%decrement in network usage,which outperformed the current replication strategy,known as Dynamic Popularity aware Replication Strategy(DPRS)algorithm.
文摘Most social networks allow connections amongst many people based on shared interests.Social networks have to offer shared data like videos,photos with minimum latency to the group,which could be challenging as the storage cost has to be minimized and hence entire data replication is not a solution.The replication of data across a network of read-intensive can potentially lead to increased savings in cost and energy and reduce the end-user’s response time.Though simple and adaptive replication strategies exist,the solution is non-deter-ministic;the replicas of the data need to be optimized to the data usability,perfor-mance,and stability of the application systems.To resolve the non-deterministic issue of replication,metaheuristics are applied.In this work,Harmony Search and Tabu Search algorithms are used optimizing the replication process.A novel Har-mony-Tabu search is proposed for effective placement and replication of data.Experiments on large datasets show the effectiveness of the proposed technique.It is seen that the bandwidth saving for proposed harmony-Tabu replication per-forms better in the range of 3.57%to 18.18%for varying number of cloud data-centers when compared to simple replication,Tabu replication and Harmony replication algorithm.
文摘Social networks(SNs)are sources with extreme number of users around the world who are all sharing data like images,audio,and video to their friends using IoT devices.This concept is the so-called Social Internet of Things(SIot).The evolving nature of edge-cloud computing has enabled storage of a large volume of data from various sources,and this task demands an efficient storage procedure.For this kind of large volume of data storage,the usage of data replication using edge with geo-distributed cloud service area is suited to fulfill the user’s expectations with low latency.The major issue is the way to store the data and replicate these large data items optimally and allocate the request from the data center efficiently.For efficient storage of these data,we use edge server,which is part of the cloud server,in this study.Thus,the data are distributed and stored with quick access,which will reduce the latency with response.The proposed data placement approach learns with machine learning(ML)algorithm called radial basis kernel function assisted with support vector machine(RBF-SVM)to classify the data center for storing the user and friend’s data from the SIoT devices.These learning algorithms will be used to predict the workload of the data stored in the data center as either edge or cloud depending on the existing time slots.The data placement with dynamic nature is also optimized using the proposed dynamic graph partitioning(GP)method to meet the individual user’s demand of low latency with minimum costs.This way will keep the SIoT data placement efficient and effective over time.Accordingly,this proposed data placement and replication approach introduces three kinds of innovations compared with the existing data placement approach.(i)Rather than storing the user data in a single cloud,this study uses the edge server closest to the SIoT devices for faster access with reduced response time.(ii)The classification algorithm called RBF-SVM is used to find storage for user for reducing data replication.(iii)Dynamic GP is introduced for data placement with reduced latency and minimum cost to fulfil the dynamic nature of the SN.The simulation result of this approach obtains reduced latency of 130 ms and minimum cost compared with those of the existing data placement approaches.Therefore,our proposed data placement with ML-based learning on edge provides promising results in terms of efficiency,effectiveness,and performance with reduced latency and minimum cost.
基金Supported by Visiting Scholar Foundation of KeyLabin University and National Lab of Switching Technology and Tele-communication Networks ([2000]123)
文摘This paper proposes a new primary lazy update protocol, PTCS (Primary Transaction Commit Schedule). In the PTCS protocol, a serializable primary transaction schedule is generated firstly and then the secondary transactions are committed according to the serializable primary transaction schedule. PTCS protocol can guarantee serializability if the data copy graph contains no directed circles. It can also be ex tended to eliminate all requirements on the data copy graph. Compared to earlier works, PTCS protocol not only imposes a much weaker requirement on the data placement, but also avoids the deadlock caused by transaction waits and extra message overhead. The performance experiments show that the degradation of the performance caused by the replica man- agement of the PTCS protocol is tolerable.
基金The NSF (10661003) of Chinathe NSF (1012138,0612163) of Guangdong Ocean University
文摘In this paper, empirical Bayes test for a parameter θ of two-parameter exponential distribution is investigated with replicated past data. Under some conditions, the asymptotically optimal property is obtained. It is indicated that the rate of convergence can be very close to O(N-2^-1) in this case that a parameter μ is known.
基金sponsored by the National Natural Science Foundation of China (61202354)the Hi-Tech Research and Development Program of China (2007AA01Z404)Scientific & Technological Support Project (Industry) of Jiangsu Province (BE2011189)
文摘In recent years, with the rapid development of data intensive applications, data replication has become an enabling technology for the data grid to improve data availability, and reduce file transfer time and bandwidth consumption. The placement of replicas has been proven to be the most difficult problem that must be solved to realize the process of data replication. This paper addresses the quality of service (QoS) aware replica placement problem in data grid, and proposes a dynamic programming based replica placement algorithm that not only has a QoS requirement guarantee, but also can minimize the overall replication cost, including storage cost and communication cost. By simulation, experiments show that the replica placement algorithm outperforms an existing popular replica placement technique in data grid.