期刊文献+

一种性能优化的小文件存储访问策略的研究 被引量:20

A Strategy of Small File Storage Access with Performance Optimization
下载PDF
导出
摘要 在分布式文件系统中,小文件的管理一般存在访问性能较差和存储空间浪费较大等缺点.为了解决这些问题,提出了一种性能优化的小文件存储访问(SFSA)策略.SFSA将逻辑上连续的数据尽可能存储在物理磁盘的连续空间,使用Cache充当元数据服务器的角色并通过简化的文件信息节点提高Cache利用率,提高了小文件访问性能;写数据时聚合更新数据及其文件夹域中的相关数据为一次I/O请求写入,减少了文件碎片数量,提高了存储空间利用率;文件传输时利用局部性原理,提前发送批量的高访问率的小文件,降低了建立网络连接开销,提升了文件传输性能.理论分析和实验证明,SFSA的设计思想和方法能有效地优化小文件的存储访问性能. In distributed file system, the management of small file storage access has encountered some problems, such as poor access performance, low disk space utilization rate, high file transfer delay, etc. To solve these problems, this paper proposes a strategy of small file storage access (SFSA) with performance optimization. SFSA can try to store logical continuous data on continuous space of physical disks as far as possible, and use a cache to act as metadata server and improve utilization rate of cache by using simplified file information node. Therefore it can improve the performance of small file storage access. In order to solve the problem of low disk space utilization rate, SFSA still uses a method of writing optimization which combines the dirty data with its related data in file folder domain into a single I/O request, so, it can reduce the number of file fragments. In addition, according to the principle of data locality, we also propose a method which sends the highly accessed small flies ahead of time. It reduces the overhead of network connection and improves the file transfer performance. Theoretical analysis and experimental results show that the design idea and method of SFSA strategy can improve the performance of small file storage access effectively.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第7期1579-1586,共8页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60573145) 教育部高等学校博士学科点专项科研基金项目(200805610019) 广州市科技计划应用基础资助基金项目(2010Y1-C681)
关键词 分布式文件系统 小文件存储 小文件存储访问 优化 访问性能 distributed file system small file storage small file storage access (SFSA) block optimization access performance
  • 相关文献

参考文献10

  • 1Baker M G,Hartman J H,Kupfer M D. Measurement of a distributed file system[A].New York:ACM,1991.198-212.
  • 2Carns P H,Ligon W B,Ross B R. PVFS:A parallel file system for Linux clusters[A].Berkeley,CA:USENIX Association,2000.28-38.
  • 3Ahn W H,Kim K,Choi Y. DFS:A de-fragmented file system[A].Piscataway,NJ:IEEE,2002.71-80.doi:10.1093/brain/awp141.
  • 4McKusick M K,Joy W N,Leffler S J. A fast file system for UNIX[J].ACM Transactions on Computer Systems,1984,(03):181-197.doi:10.1145/989.990.
  • 5Brandt S A,Miller E L,Long D D E. Efficient metadata management in large distributed storage systems[A].Piseataway,NJ:IEEE,2003.290-298.doi:10.1002/mus.22123.
  • 6Luo Min,Yokota H. Comparing Hadoop and fat-btree based access method for small file I/O applications[A].Beilin:Springer-Verlag,2010.182193.
  • 7Wang Feng. Storage management in large distributed objectbased storage system[D].Santa Cruz:University of California Santa Cruz,2006.
  • 8Carns P,Lang S,Ross R. Small-file access in parallel file systems[A].Piscataway,NJ:IEEE,2009.1-11.
  • 9金海,罗飞,章勤,张浩.一个基于P2P高性能计算的高效数据传输协议[J].计算机研究与发展,2006,43(9):1543-1549. 被引量:4
  • 10Ganger G R,Kaashoek M F. Embedded inodes and explicit grouping:exploiting disk bandwidth for small files[A].Berkeley,CA:USENIX Association,1997.1-17.doi:10.1007/s10333-011-0289-8.

二级参考文献12

  • 1Sarmenta. Volunteer computing: [Ph D dissertation] [D].Cambridge, MA: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2001
  • 2David P Anderson, Jeff Cobb, Eric Korpela, et al. SETI@home: An experiment in public-resource computing [J].Communications of the ACM, 2002, 45 (11) : 56-61
  • 3Gilles Fedak, Cecile Germain, Vincent Neff, et al.XtremWeb: A generic global computing system [C]. In: Proc of the 1st IEEE/ACM Int'l Symposium on Cluster Computing and the Grid. Los Alamitos: IEEE Computer Society Press,2001. 582-587
  • 4Hai Jin, Fei Luo, Xiaofei Liao, et al. Constructing a P2P-based high performance computing platform [C]. In: Proe of Int'l Conf on Computer and Science (ICCS' 06). Berlin: Springer,2006. 380-387
  • 5B Bustcher, W Heinze. A file transfer protocol and implementation [J]. ACM SIGCOMM Computer Communication Review, 1979, 9(3): 2-12
  • 6Jingsong Zhang, Robert D McLeod. Application layer routing options for efficient data transfer over the Internet [C]. In:Proc of 2002 IEEE Canadian Conf on Electrical & Computer Engineering. Los Alamitos: IEEE Computer Society Press,2002. 1472-1476
  • 7J Heidemann, K Obraczka, J Touch. Modeling the performance of HTTP over several transport protocols [J]. IEEE-ACM Trans on Networking, 1997, 5(5) : 626-630
  • 8P Rodriguez, A Kirpal, W E Biersaek. Parallel access for mirror sites in the Internet [C]. In: Proc of the 19th Annual Joint Conference of the IEEE Computer and Communications Societies. Los Alarnitos: IEEE Computer Society Press, 2000.864 -873
  • 9Sudharshan Vazhkudai. Enabling the co-allocation of grid data transfers [C]. In: Proc of the 4th Int'l Workshop on Grid Computing. Los Alamitos: IEEE Computer Society Press,2003. 44-51
  • 10H Sivakumar, S Bailey, R L Grossman. PSockets: The case for application-level network striping for data intensive applications using high speed wide area networks [C]. In: Proc of IEEE/ACM SC2000 Conf on Dallas, Texas: ACM Press, 2000. 38-43

共引文献3

同被引文献134

  • 1闫鹤,李小勇,胡鹏,刘海涛.分布式文件系统的流式数据预读[J].计算机研究与发展,2012,49(S1):252-256. 被引量:1
  • 2金海,官象山,吴松,谢超.分布式存储系统中文件传输优化的设计与实现[J].华中科技大学学报(自然科学版),2005,33(1):4-6. 被引量:10
  • 3郭敏,郭靖.Oracle 10G数据库性能优化的研究[J].武汉理工大学学报,2005,27(10):103-105. 被引量:18
  • 4范时平.基于满二叉树的原地快速排序[J].重庆邮电学院学报(自然科学版),2006,18(6):781-783. 被引量:7
  • 5曹军.KVMGC优化技术的研究与实现[D].成者医电子科技大学,2005.
  • 6Baker M G, Hartman J H, Kupfer M D, et al. Measurements of a distributed file system[C].ACM SIGOPS Operating Systems Review. ACM, 1991, 25(5): 198-212.
  • 7Beaver D,Kumar S,Li H C,et al. Finding a needle in Haystack: Face- book's photo storage[EB/OL].OSDI,2010.
  • 8PATASCALE DATA STORAGE INSTITUTE. NERSC file system statics [EB/ OLl.http:/ /pdsi.nersc.gov/filesystem.htm,2007-11-11.
  • 9FELIX E. Environmental molecular sciences laboratory:static survey of file system statistics[EB/OL].http://www.pdsi-scidac.org/fsstats/index. html,2011-02-23.
  • 10Borthakur D. The hadoop distributed file system: Architecture and design[l]. Hadoop Project Website, 2007, 11: 21.

引证文献20

二级引证文献156

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部