摘要
社交网站和电子商务等网络服务发展迅速,这类服务需要存储大量图片、音乐、微博文本等小文件。传统的分布式存储系统,如HDFS(Hadoop distributed file system),是面向大文件而设计的,在存储小文件时会产生元数据开销过大,访问延迟较高等问题,不能适应存储海量小文件的应用环境。分析了TFS(Taobao file system)的系统架构和读写流程,发现TFS在每次读/写过程中至少要建立3次网络连接,增大了读写延迟。针对海量小文件存储带来的挑战和TFS存在的问题,提出了一种新的低延迟、高可用的面向海量小文件的分布式存储方案,并实现了分布式文件系统SFFS(small-file file system)。性能测试表明,SFFS和TFS相比,写延迟降低了76.6%,读延迟降低了约10%。通过对系统结构的分析,相比于TFS,SFFS在中心节点的负载更轻,失效恢复更快,在可用性方面更有优势。
SNS (social networking services) and E-commerce services developed rapidly. Such services need store numerous small files like pictures, music files and macro blog texts. Traditional distributed storage systems, such as HDFS (Hadoop distributed file system), are designed for large files, which will have problems such as too much over-head with metadata and high latency when dealing with large number of small files. This paper analyzes the architec-ture and read-write flow of TFS (Taobao file system), and finds that TFS has to build several network connections when writing or reading a small file, which increases the read-write latency. Aiming at the challenge of storing numerous small files and the problems of TFS, this paper proposes SFFS (small-file file system), a low-latency high availability small-file-oriented distributed storage. The performance experiments show that the write latency of SFFS decreases 76.6%, and the read latency of SFFS decreases about 10%compared with TFS. SFFS also has a higher availability than TFS since the center node in SFFS has lighter load and can recover more quickly.
出处
《计算机科学与探索》
CSCD
2014年第4期438-445,共8页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金Grant No.61272167
国家高技术研究发展计划(863计划)Grant No.2011AA01A204
国家科技重大专项"核高基"项目Grant No.2012ZX01039-004~~
关键词
小文件
低延迟高可用
分布式存储
small file
low-latency high availability
distributed storage