Frequency and Similarity-Aware Partitioning for Cloud Storage Based on Space-Time Utility Maximization Model 被引量：4

Frequency and Similarity-Aware Partitioning for Cloud Storage Based on Space-Time Utility Maximization Model

导出

摘要 With the rise of various cloud services, the problem of redundant data is more prominent in the cloud storage systems. How to assign a set of documents to a distributed file system, which can not only reduce storage space, but also ensure the access efficiency as much as possible, is an urgent problem which needs to be solved. Space-efficiency mainly uses data de-duplication technologies, while access-efficiency requires gathering the files with high similarity on a server. Based on the study of other data de-duplication technologies, especially the Similarity-Aware Partitioning （SAP） algorithm, this paper proposes the Frequency and Similarity-Aware Partitioning （FSAP） algorithm for cloud storage. The FSAP algorithm is a more reasonable data partitioning algorithm than the SAP algorithm. Meanwhile, this paper proposes the Space-Time Utility Maximization Model （STUMM）, which is useful in balancing the relationship between space-efficiency and access-efficiency. Finally, this paper uses 100 web files downloaded from CNN for testing, and the results show that, relative to using the algorithms associated with the SAP algorithm （including the SAP-Space-Delta algorithm and the SAP-Space-Dedup algorithm）, the FSAP algorithm based on STUMM reaches higher compression ratio and a more balanced distribution of data blocks. With the rise of various cloud services, the problem of redundant data is more prominent in the cloud storage systems. How to assign a set of documents to a distributed file system, which can not only reduce storage space, but also ensure the access efficiency as much as possible, is an urgent problem which needs to be solved. Space-efficiency mainly uses data de-duplication technologies, while access-efficiency requires gathering the files with high similarity on a server. Based on the study of other data de-duplication technologies, especially the Similarity-Aware Partitioning （SAP） algorithm, this paper proposes the Frequency and Similarity-Aware Partitioning （FSAP） algorithm for cloud storage. The FSAP algorithm is a more reasonable data partitioning algorithm than the SAP algorithm. Meanwhile, this paper proposes the Space-Time Utility Maximization Model （STUMM）, which is useful in balancing the relationship between space-efficiency and access-efficiency. Finally, this paper uses 100 web files downloaded from CNN for testing, and the results show that, relative to using the algorithms associated with the SAP algorithm （including the SAP-Space-Delta algorithm and the SAP-Space-Dedup algorithm）, the FSAP algorithm based on STUMM reaches higher compression ratio and a more balanced distribution of data blocks.

作者 Jianjiang Li Jie Wu Zhanning Ma

机构地区 Department of Computer Science and Technology Department of Computer and Information Sciences

出处《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2015年第3期233-245,共13页 清华大学学报（自然科学版（英文版）

基金 supported by the National High-Tech Research and Development (863) Program of China (No. 2015AA01A303)

关键词 de-duplication cloud storage REDUNDANCY FREQUENCY de-duplication cloud storage redundancy frequency

分类号 TP333 [自动化与计算机技术—计算机系统结构] TU-092 [建筑科学—建筑理论]

引文网络
相关文献

参考文献23

1T. Benson, A. Akella, and D. A. Maltz, Network traffic characteristics of data centers in the wild, in Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, ACM, 2010, pp. 267-280.
2D. T. Meyer and W. J. Bolosky, A study of practical deduplication, ACM Transactions on Storage (TOS), vol. 7, no. 4, p. 14, 2012.
3A. T. Clements, I. Ahmad, M. Vilayannur, and J. Li, Decentralized deduplication in san cluster file systems, in USENIXAnnual Technical Conference, 2009, pp. 101-114.
4U. Manber, Finding similar files in a large file system, in Usenix Winter 1994 Technical Conference, 1994, pp. 1-10.
5B. S. Baker, On finding duplication and near-duplication in large software systems, in Reverse Engineering, 1995, Proceedings of 2nd Working Conference on, IEEE, 1995, pp. 86-95.
6G. Forman, K. Eshghi, and S. Chiocchetti, Finding similar files in large document repositories, in Proceedings of the Eleventh ACM S1GKDD International Conference on Knowledge Discovery in Data Mining, ACM, 2005, pp. 394-400.
7J. R. Douceur, A. Adya, W. J. Bolosky, R Simon, and M. Theimer, Reclaiming space from duplicate files in a serverless distributed file system, in Distributed Computing Systems, 2002. Proceedings. 22nd International Conference on, IEEE, 2002, pp. 617-624.
8S. Quinlan and S. Dorward, Venti: A new approach to archival storage, FAST, vol. 2, pp. 89-101, 2002.
9B. Zhu, K. Li, and R. H. Patterson, Avoiding the disk bottleneck in the data domain deduplication file system, FAST, vol. 8, pp. 1-14, 2008.
10B. Balasubramanian, T. Lan, and M. Chiang, Sap: Similarity aware partitioning for efficient cloud storage, in Infocom 2014 Proceedings IEEE, 2014, pp. 592-600.

同被引文献13

1郝斐,王雷,荆继武,常建国.云存储安全增强系统的设计与实现[J].信息网络安全,2012(3):38-41. 被引量：19
2周国安,李强,陈新,胡旭.云环境下海量小文件存储技术研究综述[J].信息网络安全,2014(6):11-17. 被引量：16
3秦志光,吴世坤,熊虎.云存储服务中数据完整性审计方案综述[J].信息网络安全,2014(7):1-6. 被引量：21
4王于丁,杨家海,徐聪,凌晓,杨洋.云计算访问控制技术研究综述[J].软件学报,2015,26(5):1129-1150. 被引量：194
5HE Kai,HUANG Chuanhe,ZHOU Hao,SHI Jiaoli,WANG Xiaomao,DAN Feng.Public Auditing for Encrypted Data with Client-Side Deduplication in Cloud Storage[J].Wuhan University Journal of Natural Sciences,2015,20(4):291-298. 被引量：4
6雷蕾,蔡权伟,荆继武,林璟锵,王展,陈波.支持策略隐藏的加密云存储访问控制机制[J].软件学报,2016,27(6):1432-1450. 被引量：23
7王龙江,陈越,严新成,黄恺翔.网络编码云存储系统差分数据更新方案[J].通信学报,2017,38(3):154-164. 被引量：11
8Guanglin Zhang,Jian Liu,Jiajie Ren.Multicast Capacity of Cache Enabled Content-Centric Wireless Ad Hoc Networks[J].China Communications,2017,14(7):25-33. 被引量：2
9王惠峰,李战怀,张晓,孙鉴,赵晓南.云存储中支持失效文件快速查询的批量审计方法[J].计算机学报,2017,40(10):2338-2351. 被引量：7
10范博,杨润垲,黎琳.基于SSH的可信信道建立方法研究[J].信息网络安全,2018(1):45-51. 被引量：6

引证文献4

1施南业,袁莹,汪昕晨,俞俊杰.基于多比特重复数据删除的云存储信道隐藏[J].计算机工程,2018,44(6):111-116. 被引量：1
2马苗立,张洪波,丁卫颖.基于字节级优化更新的云存储增量式网络编码方法研究[J].信息网络安全,2018(11):18-26.
3赵彤,何涛.异构通信的可伸缩通用矩阵乘二维波动内核成本测量[J].电视技术,2019,43(2):73-80.
4任晓莉,杨建卫,李乃乾.云计算中基于动态虚拟化电子流密码的安全存储[J].计算机科学与探索,2019,13(8):1331-1340. 被引量：13

二级引证文献14

1任元芬,黄永泉,付丽菊.针刺治疗小儿腹泻118例疗效观察[J].针灸临床杂志,2000,16(2):27-28. 被引量：1
2张萍.基于双向寻优粒子群的网络涉密信息安全存取[J].实验技术与管理,2020,37(10):56-60. 被引量：5
3牛鹏伟.云计算数据中心虚拟化电子流加密方法研究[J].信息与电脑,2020,32(22):30-31. 被引量：1
4庄银霞.基于网格技术的分布式大数据混合云存储方法[J].廊坊师范学院学报（自然科学版）,2021,21(1):12-16. 被引量：3
5杨海亮.基于安全散列算法的电子数据存证方法设计[J].电子设计工程,2021,29(13):104-108.
6朱荣军.物联网感知信息采集过程重复数据批量剔除方法[J].齐齐哈尔大学学报（自然科学版）,2022,38(1):21-25. 被引量：2
7张华.大数据平台对密钥流分析的可行性研究[J].黑龙江工业学院学报（综合版）,2022,22(1):86-92.
8王文立,陈杰.基于回填算法的实验室云端访问安全监控仿真[J].计算机仿真,2022,39(1):441-445. 被引量：5
9邓一星,蔡沂,王文翰.云计算技术下大规模用户密码安全认证算法[J].计算机仿真,2022,39(2):141-144. 被引量：8
10徐敏,胡聪,王萍,刘翠玲,许畅,吴尚.基于云计算技术的大规模数据存储策略研究[J].微型电脑应用,2022,38(4):80-83. 被引量：13

1曲昊.使用空间时间形状差分图像进行动作识别[J].九江职业技术学院学报,2016(1):16-18.
2模具专用CAD/CAM系统——SPACE-E简介[J].机电国际市场,2000(4):22-23.
3Chuanwen Liu.Infrared Image Target Segmentation Processing Based On Space-Time Combination[J].通讯和计算机（中英文版）,2006,3(3):102-108. 被引量：3
4陈华江.浅谈如何加强建筑工程施工管理[J].科技风,2015(1):158-158.
5王梦婷,曹克亮.浅谈微信朋友圈广告的优势与劣势[J].科技视界,2015(25):175-175. 被引量：4
6Hongtao Yu Zhongcheng Yu.Research of Modeling Moving Objects Database over Space-time Grid[J].通讯和计算机（中英文版）,2010,7(3):64-68.
7王咏武,王咏刚.TRAFFIC—调试专家必经之路——《Why Programs Fail-系统化调试指南》译后感[J].程序员,2007(6):122-125.
8Tong Weiqin Zhou Qinghua Gu Zhikui (School of Computer Engineering and Science).Thread-Oriented Online Load Balancing[J].Advances in Manufacturing,1998(3):48-50.
9Balancing on a Pinhead[J].China International Business,2013(6):54-56.
10Jinzhong Yan.State Space-Time and Four States of Universe[J].Journal of Physical Science and Application,2013,3(2):127-134. 被引量：5

Tsinghua Science and Technology

2015年第3期

浏览历史

内容加载中请稍等...

Frequency and Similarity-Aware Partitioning for Cloud Storage Based on Space-Time Utility Maximization Model 被引量：4

参考文献23

同被引文献13

引证文献4

二级引证文献14

相关作者

相关机构

相关主题

浏览历史