摘要
提出一种基于分布式数据库与分布式文件系统相结合的海量图片文件存储去重技术。该技术通过提取图片文件二进制流的特征段计算文件MD5码签名,依据签名对图片文件进行存储去重。结合实验数据分析验证该技术不仅能够准确地去重图片,有较高的删除率,且经对比得到该技术在计算签名时间、上传速度等方面均优于文件级去重和块级去重技术,是对海量图片数据存储的一种优化。同时针对该技术的不足提出了改进方案。
In this paper we present a deduplication technology for massive image files storage. This technology,which is based on the combination of distributed database and distributed file system,calculates file's of MD5 signature by extracting the feature segment of binary stream of image files,and deduplicates the storage in regard to image files according to the signature. It has been analysed and verified in combination with the experimental data that this technology is accurate in deduplicating images,besides,it has a high deletion rate. What's more,compared with file-level deduplication and block-level deduplication technology,this technology is superior in calculating the time of signature and uploading speed,and offers an optimisation to massive image files storage. Meanwhile,we also put forward in this paper an improved scheme aiming at the deficiency of this technology.
出处
《计算机应用与软件》
CSCD
北大核心
2014年第4期56-58,共3页
Computer Applications and Software
基金
国家自然科学基金项目(61272391)
关键词
图片文件
去重
分布式
MD5
Image file Deduplication Distributed MD5