摘要
大型分布式系统通常将系统内存储的数据复制到多个节点以减少数据访问的时间开销.然而,随着数据副本数量的增加,副本数据更新过程的写代价也随之增加.如何合理地选择数据副本的存储节点、控制副本数量,以平衡数据的读写开销,进而有效地降低系统总的数据访问代价是分布式存储的研究热点.针对这一问题,本文提出了一种基于遗传算法的数据复制方法来平衡数据的读写开销.具体地本文对遗传算法进行了以下两方面改进:(1)建立了一个综合考虑读写数据传输代价的评价函数,以控制遗传算法的收敛方向,搜索数据副本存放位置的最优或次优策略;(2)通过时间序列预测方法来启发式地指导染色体变异操作,以合理控制副本数量适应数据的读写访问趋势.实验表明,与传统方法相比,本方法能够更有效地降低数据访问的总时间代价.
Big distributed systems usually reduce data access time by replicating data to many servers. However, update overhead increases with replica number increases. The hot topic of distributed storage is how to choose replicas' placement and control replica number to balance read and update cost, thus reducing the access overhead of distributed system. To solve this problem, this article proposes a data replication strategy based on the Genetic Algorithm to balance read and update overhead. Specifically, this article improves Genetic Algorithm in the two following aspects: 1. building an evaluation function by considering read and update overhead to control the convergence direction of the Genetic Algorithm for finding the best or the suboptimal replica placement. 2. directing chromosomal mutation heuristically by time series forecasting method to control replica number to adapt to the trend of reading and updating data. Experiment results show that this method can efficiently reduce the whole time overhead of access- ing data.
出处
《四川大学学报(自然科学版)》
CAS
CSCD
北大核心
2013年第1期56-60,共5页
Journal of Sichuan University(Natural Science Edition)
基金
国家自然科学基金(61173159)
关键词
数据复制
读写开销
分布式系统
副本数量
data replication, read and update overhead, distributed system, replica number