一种平衡数据读写开销的数据复制方法

A replication method with balancing read and update overhead

导出

摘要大型分布式系统通常将系统内存储的数据复制到多个节点以减少数据访问的时间开销.然而,随着数据副本数量的增加,副本数据更新过程的写代价也随之增加.如何合理地选择数据副本的存储节点、控制副本数量,以平衡数据的读写开销,进而有效地降低系统总的数据访问代价是分布式存储的研究热点.针对这一问题,本文提出了一种基于遗传算法的数据复制方法来平衡数据的读写开销.具体地本文对遗传算法进行了以下两方面改进:(1)建立了一个综合考虑读写数据传输代价的评价函数,以控制遗传算法的收敛方向,搜索数据副本存放位置的最优或次优策略;(2)通过时间序列预测方法来启发式地指导染色体变异操作,以合理控制副本数量适应数据的读写访问趋势.实验表明,与传统方法相比,本方法能够更有效地降低数据访问的总时间代价. Big distributed systems usually reduce data access time by replicating data to many servers. However, update overhead increases with replica number increases. The hot topic of distributed storage is how to choose replicas＇ placement and control replica number to balance read and update cost, thus reducing the access overhead of distributed system. To solve this problem, this article proposes a data replication strategy based on the Genetic Algorithm to balance read and update overhead. Specifically, this article improves Genetic Algorithm in the two following aspects： 1. building an evaluation function by considering read and update overhead to control the convergence direction of the Genetic Algorithm for finding the best or the suboptimal replica placement. 2. directing chromosomal mutation heuristically by time series forecasting method to control replica number to adapt to the trend of reading and updating data. Experiment results show that this method can efficiently reduce the whole time overhead of access- ing data.

作者王喆陈文胡心雷邹洋

机构地区四川大学计算机学院

出处《四川大学学报（自然科学版）》 CAS CSCD 北大核心 2013年第1期56-60,共5页 Journal of Sichuan University(Natural Science Edition)

基金国家自然科学基金(61173159)

关键词数据复制读写开销分布式系统副本数量 data replication, read and update overhead, distributed system, replica number

分类号 TP309 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献15

1Amazon. Amazon web service [DB/OL]. (2011-12- 22). [2012-9-29]. http://aws, amazon, com.
2Microsoft. Windows azure [DB/OL]. (2011-12-25) [2012-10-2], http ://www. windowsazure, com.
3Ranganathan K, Foster I. Identifying dynamic repli- cation strategies for a high-performance data grids [C]. Denver, USA[s. n.], 2001.
4Chang R S, Chang H P. A dynamic data replication strategy using access-weights in data grids[J]. J Su- percomput, 2008,45:277.
5Zhang J W, Lee B S, Tang X Y,et al. A model to predict the optimal performance of the Hierarchical Data Grid[J]. Future Gener Comput Syst, 2010, 26:1.
6TangM, Lee B S, Yeo C K, et al. Dynamic replica- tion algorithms for the multi-tier data grid[J]. Fu- ture Gener Comput Syst, 2005, 21:775.
7Zaman S, Grosu D. A distributed algorithm for the replica placement problem[J]. IEEE Transaction on Parallel Distr Syst, 2011, 22:1455.
8Nukarapu D T, Tang B, Wang L Q, etal. Data rep- lication in data intensive scientific applications with performance guarantee [J ]. IEEE Transaction on Parallel Distr Syst, 2011,22 : 1299.
9Tang M, Lee BS, Tang X Y, etal. The impact of data replication of job scheduling performance in the data grid[J]. Future Gener Comput Syst, 2006, 22: 254.
10Bsoul M, A1-Khasawneh A, Kilani Y, et al. A threshold-based dynamic data replication strategy [J]. J Supercomput, 2012, 60(3): 301.

二级参考文献6

1柏银,李志蜀,朱兴东.MD5算法及其在远程身份认证中的应用[J].四川大学学报（自然科学版）,2006,43(2):305-309. 被引量：19
2Elrod R.So you think you have a good business recovery plan?-steps an asset management company can take to recovery form a major disaster[EB/OL].(2005-08-25).[2009-4-11].http://www.infosecwriters.com/textresources/pdf/GoodBusinessRecoveryPlan.pdf.
3SNIA.The 2008 dictionary of storage networking terminology[EB/OL].(2008-06-18).[2009-4-11].http://www.snia.org/education/dictionary/SNIADictionaryEH2008.pdf.
4Shah B.Disk performance of copy-on-write snapshot logical volumes[D].British Columbia:The University of British Columbia,2006:2.
5Mark E,Russinovich D A.Microsoft windows internals fourth edition,pan aimin translated[M].BeiJing:Publishing House of Electronics Industry,2007.
6易固武,刘晓洁,李涛,卢正添,葛亮,周煜.一种网络备份系统的数据一致性检测方法[J].计算机应用研究,2008,25(12):3700-3701. 被引量：6

共引文献1

1徐慧,孙世佳.基于Minifilter的实时备份系统的研究与实现[J].中国科技财富,2012(1):116-117.

1余长春.基于散列存储和数据加密的数据安全保护研究[J].信息安全与技术,2014,5(10):30-32.
2姜建华,徐高潮,魏晓辉,冀会芳.数据网格中基于预测访问代价的作业调度算法[J].解放军理工大学学报（自然科学版）,2008,9(5):532-535. 被引量：3
3郭天杰,曹强,谢长生.远程镜像技术和方法研究[J].计算机工程与科学,2006,28(10):38-41. 被引量：6
4尹华,胡玉平.一种代价敏感随机森林算法[J].武汉大学学报（工学版）,2014,47(5):707-711. 被引量：10
5郭正泽,赵红东,姚奕洋,陈洁萌,冯嘉鹏.基于FPGA的AES加密算法功耗研究[J].河北工业大学学报,2015,44(1):18-22.
6周晓军,沈炜冰,奚立峰.实时监测中设备状态的自适应采集算法[J].上海交通大学学报,2008,42(12):1975-1978. 被引量：1
7周欣欣,余镇危.基于流行度及最小访问代价的MP2P协同缓存优化策略[J].计算机工程与科学,2013,35(8):31-35. 被引量：3
8郭超,金晓明,荣冈.数据校正技术在流程工业企业物料平衡中的应用[J].化工自动化及仪表,2005,32(3):39-41. 被引量：15
9陈颉,周智,黄刘生.负载平衡在三维渲染中的应用[J].计算机工程与应用,2005,41(34):65-68. 被引量：9
10左翠华,卢正鼎,李瑞轩.针对负载均衡的P2P动态副本策略[J].小型微型计算机系统,2007,28(11):2020-2023. 被引量：5

四川大学学报（自然科学版）

2013年第1期

浏览历史

内容加载中请稍等...

一种平衡数据读写开销的数据复制方法

参考文献15

二级参考文献6

共引文献1

相关作者

相关机构

相关主题

浏览历史