摘要
随着云计算的发展,云存储技术通过集群应用、虚拟化技术、分布式文件系统等功能将网络中大量各种不同类型的存储设备集合起来协同工作,缓解了老式数据中心的存储压力.另外,重复数据删除技术是一种缩减存储空间减少网络传输量的技术,随着云的广泛应用也势必会发展应用于云存储中.这两种技术结合将会给IT存储业带来实际效益.本文通过研究重复数据删除技术、云存储技术,设计了基于云存储的重复数据删除架构,提出了一种用In-line方式在客户端进行数据块级与字节级相结合的重复数据删除操作后再将数据存入云中的方案.在本架构下,海量数据存储在HDFS中;而文件数据块的哈希值存储在HBase中.
With the development of cloud computing, the cloud storage technology gets a large variety of different types of network storage devices together to work collaboratively by clustering applications, virtualization, Distributed File System, alleviating the pressure of old data center storage. Besides, Data De-duplication is a technology that reduces storage space and lowers the network transmission. And it is going to be adaptable for cloud storage system one day. The combination of these two technologies will bring real benefits to IT storage industry. The paper has designed a de-duplication architecture based on cloud storage, proposed a scheme which runs at the client with In-line manner to eliminate duplicated data in chunk level, and then put those data into cloud. Under this architecture, HDFS stores the mass data while HBase stores hash value of data block.
出处
《计算机系统应用》
2013年第1期208-211,共4页
Computer Systems & Applications