期刊文献+

Frequency and Similarity-Aware Partitioning for Cloud Storage Based on Space-Time Utility Maximization Model 被引量:4

Frequency and Similarity-Aware Partitioning for Cloud Storage Based on Space-Time Utility Maximization Model
原文传递
导出
摘要 With the rise of various cloud services, the problem of redundant data is more prominent in the cloud storage systems. How to assign a set of documents to a distributed file system, which can not only reduce storage space, but also ensure the access efficiency as much as possible, is an urgent problem which needs to be solved. Space-efficiency mainly uses data de-duplication technologies, while access-efficiency requires gathering the files with high similarity on a server. Based on the study of other data de-duplication technologies, especially the Similarity-Aware Partitioning (SAP) algorithm, this paper proposes the Frequency and Similarity-Aware Partitioning (FSAP) algorithm for cloud storage. The FSAP algorithm is a more reasonable data partitioning algorithm than the SAP algorithm. Meanwhile, this paper proposes the Space-Time Utility Maximization Model (STUMM), which is useful in balancing the relationship between space-efficiency and access-efficiency. Finally, this paper uses 100 web files downloaded from CNN for testing, and the results show that, relative to using the algorithms associated with the SAP algorithm (including the SAP-Space-Delta algorithm and the SAP-Space-Dedup algorithm), the FSAP algorithm based on STUMM reaches higher compression ratio and a more balanced distribution of data blocks. With the rise of various cloud services, the problem of redundant data is more prominent in the cloud storage systems. How to assign a set of documents to a distributed file system, which can not only reduce storage space, but also ensure the access efficiency as much as possible, is an urgent problem which needs to be solved. Space-efficiency mainly uses data de-duplication technologies, while access-efficiency requires gathering the files with high similarity on a server. Based on the study of other data de-duplication technologies, especially the Similarity-Aware Partitioning (SAP) algorithm, this paper proposes the Frequency and Similarity-Aware Partitioning (FSAP) algorithm for cloud storage. The FSAP algorithm is a more reasonable data partitioning algorithm than the SAP algorithm. Meanwhile, this paper proposes the Space-Time Utility Maximization Model (STUMM), which is useful in balancing the relationship between space-efficiency and access-efficiency. Finally, this paper uses 100 web files downloaded from CNN for testing, and the results show that, relative to using the algorithms associated with the SAP algorithm (including the SAP-Space-Delta algorithm and the SAP-Space-Dedup algorithm), the FSAP algorithm based on STUMM reaches higher compression ratio and a more balanced distribution of data blocks.
出处 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2015年第3期233-245,共13页 清华大学学报(自然科学版(英文版)
基金 supported by the National High-Tech Research and Development (863) Program of China (No. 2015AA01A303)
关键词 de-duplication cloud storage REDUNDANCY FREQUENCY de-duplication cloud storage redundancy frequency
  • 相关文献

参考文献23

  • 1T. Benson, A. Akella, and D. A. Maltz, Network traffic characteristics of data centers in the wild, in Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, ACM, 2010, pp. 267-280.
  • 2D. T. Meyer and W. J. Bolosky, A study of practical deduplication, ACM Transactions on Storage (TOS), vol. 7, no. 4, p. 14, 2012.
  • 3A. T. Clements, I. Ahmad, M. Vilayannur, and J. Li, Decentralized deduplication in san cluster file systems, in USENIXAnnual Technical Conference, 2009, pp. 101-114.
  • 4U. Manber, Finding similar files in a large file system, in Usenix Winter 1994 Technical Conference, 1994, pp. 1-10.
  • 5B. S. Baker, On finding duplication and near-duplication in large software systems, in Reverse Engineering, 1995, Proceedings of 2nd Working Conference on, IEEE, 1995, pp. 86-95.
  • 6G. Forman, K. Eshghi, and S. Chiocchetti, Finding similar files in large document repositories, in Proceedings of the Eleventh ACM S1GKDD International Conference on Knowledge Discovery in Data Mining, ACM, 2005, pp. 394-400.
  • 7J. R. Douceur, A. Adya, W. J. Bolosky, R Simon, and M. Theimer, Reclaiming space from duplicate files in a serverless distributed file system, in Distributed Computing Systems, 2002. Proceedings. 22nd International Conference on, IEEE, 2002, pp. 617-624.
  • 8S. Quinlan and S. Dorward, Venti: A new approach to archival storage, FAST, vol. 2, pp. 89-101, 2002.
  • 9B. Zhu, K. Li, and R. H. Patterson, Avoiding the disk bottleneck in the data domain deduplication file system, FAST, vol. 8, pp. 1-14, 2008.
  • 10B. Balasubramanian, T. Lan, and M. Chiang, Sap: Similarity aware partitioning for efficient cloud storage, in Infocom 2014 Proceedings IEEE, 2014, pp. 592-600.

同被引文献13

引证文献4

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部