摘要
云数据处理系统中广泛采用了多数据副本复制技术,以防止数据丢失,如果数据复制的份数或位置不当,就会引起数据的可用性小于用户期望的数据可用性或存储空间的浪费(如复制份数过多)。针对该问题,经研究提出了一种基于模糊预测的数据复制优化模型,该模型由模糊预测模块和复制优化模块组成。模糊预测模块以节点信息(CPU信息、节点带宽信息、内存信息和硬盘信息)作为输入,预测出节点的可用性;复制优化模块把节点的可用性和用户期望的数据可用性作为输入,计算出在满足用户期望情况下数据复制的份数和位置。提出的复制优化模型能根据云数据存储系统中数据节点可用性实现动态的优化数据复制,能获得较高的存储性价比。模拟实验中基于模糊预测的数据复制优化模型策略需要的存储空间分别是Hadoop策略的42.62%,42.84%,但文件的平均可用性可达到88.69%,90.54%,表明提出的基于模糊预测的复制模型实现了在节省存储空间的同时保证了文件可用性。
The use of multiple data copies is widespread in cloud data processing systems in case of data loss. If the number of data copies or the position of data replication is inappropriate, there' s a chance that could cause the availability of data to be unmatched the expecta- tion and a waste of storage spaces, for instance, the copy number is too high. As with this fact, a data replication optimization model based on fuzzy forecasting is presented. It consists of fuzzy forecasting and data replication optimization. The fuzzy forecasting makes use of the information of a node, which includes information of CPU, bandwidth, memory and hard drive, to forecast the availability. Replication op- timization consumes the availability of nodes and user' s expectation to calculate the number of data copies and replication position. This model could dynamically optimize data replication through the availability of nodes in a cloud data storage system, which achieves a good performance price tradeoff for data storage. Simulation experiment data replication strategy optimization model based on fuzzy prediction need storage space is Hadoop strategy respectively 42.62% ,42.84% ,while the average availability of documents can reach 88.69% and 90.54% ,showed that the replication model based on fuzzy prediction realized in saves storage space at the same time to ensure the file a- vailability.
出处
《计算机技术与发展》
2013年第12期82-85,91,共5页
Computer Technology and Development
基金
广东省自然科学基金项目(10451064101005155
S2011010001754)
广东省科技计划项目(2010B010600032)