海量小文件元数据的分布式存储与检索

Distributed storage and retrieval of massive small files metadata

下载PDF

导出

摘要针对现有分布式文件系统处理海量小文件时存在的主节点元数据处理性能瓶颈问题,提出采用分布式文件来存储元数据,并通过元数据缓冲和Hash映射实现元数据的分布;采用Map Reduce并行程序对元数据检索进行了实现,并指出了并行检索中存在的问题,提出采取局部位图索引对元数据检索进行了优化.最后通过实验进行了验证,实验结果证明,该方法实现了海量元数据的分布式存储与检索,避免了现有分布式文件系统在处理海量小文件时存在的主节点单点性能瓶颈. For the bottleneck performance on master node metadata processing when the current distributed file systems processing the massive small files, this paper proposes using the distributed file to store metadata and implement the distribution of metadata through its buffer and Hash mapping, and using the MapReduee parallel program to search the metadata and have its implementation, points out the existing problems of parallel retrieval and optimizes the metadata retrieval by using local map index, and finally, carried out a test by experiments. Experimental results demonstrate that this proposed method can implement the distributed storage and retrieval of massive metadata, and avoid the single point bottleneck performance on master node when using the existing distributed file system to process massive small files.

作者周国安李强陈新胡旭

机构地区空军预警学院

出处《空军预警学院学报》 2014年第6期427-431,共5页 Journal of Air Force Early Warning Academy

关键词海量小文件元数据分布存储并行检索 massive small files metadata distributed storage parallel retrieval

分类号 TP333 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献11

1FELIX E. Environmental molecular sciences laboratory:static survey of file system statistics[EB/OL].(2011-02-23)[2014-09-01].http://www.pdsiscidac.org/fsstats/index.html.
2GHEMAWAT S, GOBIOFF H, LEUNG S T. The Googlefile system[C]//ACM SIGOPS Operating Systems Review.ACM,2003,37(5):29-43.
3SHVACHKO K, KUANG H, RADIA S,et al.The Hadoopdistributed file system [EB /OL] .(2010-03-01 )[2014-09-01 ] .http://www.aosabook.org./en/hdfs.html.
4Amazon. Amazon simple storage service [EB/OL]. (2011-03-01) [2014-09-01]. http://www.amazon.com/s3.
5LIU Xu-hui, HAN Ji-zhong, ZHONG Yun-qin, et al. Im-plementing WebGIS on Hadoop: a case study of improv-ing small file I/O performance on HDFS[C]//IEEE Inter-national Conference on Cluster Computing. IEEE, 2009:1-8.
6DONG Bo, QIU Jie, ZHENG Qing-hua,et al. A novel ap-proach to improving the efficiency of storing and access-ing small files on Hadoop: a case study by PowerPointfiles [C] // IEEE International Conference on ServicesComputing. IEEE Computer Society, 2010: 65-72.
7王涛,姚世红,徐正全,熊炼.云存储中面向访问任务的小文件合并与预取策略[J].武汉大学学报（信息科学版）,2013,38(12):1504-1508. 被引量：14
8张启飞,张尉东,李文娟,潘雪增,沈雁.基于对等网络的面向小文件的云存储系统[J].浙江大学学报（工学版）,2013,47(1):8-14. 被引量：9
9刘立坤,武永卫,徐鹏志,杨广文.CorsairFS:一种面向校园网的分布式文件系统[J].西安交通大学学报,2009,43(8):43-47. 被引量：8
10陈卓,熊劲,马灿.基于SSD的机群文件系统元数据存储系统[J].计算机研究与发展,2012,49(S1):269-275. 被引量：8

二级参考文献61

1LIU Likun, WU Yongwei, YANG Guangwen, et al. ZettaDS: a ligh-weight distributed storage system for cluster[C]//Proceedings of the 3rd China Grid Annual Conference. Piscataway, NJ, USA:IEEE, 2008:158- 164.
2Corsair Working Group. Corsair project [ EB/OL]. [2009-02-22]. http://corsair. thuhpc. org/.
3BRESNAHAN J, LINK M, KETTIMUTHU R, et al. Gridftp pipelining [EB/OL]. [2009-02-22]. http: // www. globus. org/alliance/publications/papers/ Pipelining. pdf.
4GHEMAWAT S, GOBIOFF H, LEUNG S T. The google file system[J]. SIGOPS Oper Syst Rev, 2003, 37(5):29-43.
5BIALECKI A. Hadoop project[EB/OL]. [2009-04- 30]. http://hadoop.apache. org/.
6HOWARD J H, KAZAR M L, MENEES S G, et al. Scale and performance in a distributed file system[J]. ACM Trans Comput Syst, 1988, 6(1) :51-81.
7SCHMUCK F, HASKIN R. GPFS: a shared-disk file system for large computing clusters[EB/OL]. [2009- 02- 22]. http: // db. usenix. org/events/fast02/ schmuck.html.
8ANDERSON T E, DAHLIN M D, Neefe J M, et al. Serverless network file systems [J]. SIGOPS Oper Syst Rev, 1995,29(5):109-126.
9OLSON M A, BOSTIC K, SELTZER M. Berkeley DB [EB/OL]. [2009-02-22]. http://www. usenix. org/publications/library/proceedings/usenix99/technical freenix.html.
10Armbrust Michael, Fox Armando, Griffith Rean et al. A view of cloud computing. Communications of the ACM, 2010, 53(4): 50-58.

共引文献61

1Chun-Ling Cheng,Chun-Ju Sun,Xiao-Long Xu,Deng-Yin Zhang.A Multi-dimensional Index Structure Based on Improved VA-file and CAN in the Cloud[J].International Journal of Automation and computing,2014,11(1):109-117. 被引量：2
2吴俊,苏寅生,马骞,陈新,耿大庆.南方电网厂站接线图平台的设计与应用[J].自动化与仪器仪表,2016(3):132-134. 被引量：5
3安俊秀.基于服务器集群的云检索系统的研究与示范[J].计算机科学,2010,37(7):179-182. 被引量：7
4余思,桂小林,黄汝维,庄威.一种提高云存储中小文件存储效率的方案[J].西安交通大学学报,2011,45(6):59-63. 被引量：43
5马晓亭,陈臣.数字图书馆云存储应用系统研究与实现[J].图书馆理论与实践,2012(5):8-13. 被引量：17
6牛德姣,蔡涛,詹永照,鞠时光.基于生存期的云存储元数据缓存算法[J].江苏大学学报（自然科学版）,2012,33(6):678-683. 被引量：2
7蔡涛,牛德姣,刘扬宽,李帅,鞠时光.NVMMDS——一种面向非易失存储器的元数据管理方法[J].计算机研究与发展,2013,50(1):69-79. 被引量：3
8张雁翔.浅谈IPV6校园网的建设与思考[J].计算机光盘软件与应用,2013,16(1):59-60.
9申德荣,于戈,王习特,聂铁铮,寇月.支持大数据管理的NoSQL系统研究综述[J].软件学报,2013,24(8):1786-1803. 被引量：194
10张雯.一种基于网络资源的数据挖据方法[J].数字通信,2013,40(5):84-87. 被引量：1

1张学浪,耿楠.基于云计算的图像并行检索关键技术研究[J].计算机应用与软件,2013,30(5):220-222. 被引量：5
2乐晓波,吴晓红.一个有效的快速并行检索算法[J].微电子学与计算机,1993,10(7):33-37. 被引量：1
3高珊,何婷婷,胡文敏.一种基于锚文本的并行检索策略[J].计算机工程,2008,34(19):30-31. 被引量：2
4王荣德,荆一楠,王欢,高海锋.基于时间戳索引的日志文件并行检索技术研究[J].计算机应用与软件,2011,28(2):145-147. 被引量：3
5田生伟,禹龙.搜索引擎中并行检索均衡自适应机制的研究与实践[J].计算机应用与软件,2005,22(4):83-84.
6冯汝伟,谢强,丁秋林.基于文本聚类与分布式Lucene的知识检索[J].计算机应用,2013,33(1):186-188. 被引量：10
7吴楠.高校数字图书馆元数据检索系统的设计[J].黑龙江科技信息,2014(24):162-162.
8周飚,田生伟.基于Jsp/Javabean多层结构的高速网上信息检索系统的设计[J].伊犁师范学院学报（社会科学版）,2006,25(3):88-90.
9赖积保,罗晓丽,余涛,贾培艳.一种支持云计算的遥感影像数据组织模型研究[J].计算机科学,2013,40(7):80-83. 被引量：16
10薛建生,苏波.一种并行路由计算方案的研究与设计[J].辽宁大学学报（自然科学版）,2002,29(1):28-31.

空军预警学院学报

2014年第6期

浏览历史

内容加载中请稍等...

海量小文件元数据的分布式存储与检索

参考文献11

二级参考文献61

共引文献61

相关作者

相关机构

相关主题

浏览历史