期刊文献+

基于MapReduce的海量文件检索方法研究 被引量:1

Research on Massive File Retrieval Method Based on MapReduce
下载PDF
导出
摘要 在文件检索的方法中,目前主要是基于数据库进行检索。但是,当待检索的数据量变得非常大的时候,再使用这种检索方式,大量的检索操作就会集中在一台主机上进行,这会导致检索效率降低。基于这种情况,拟采用分布式系统来解决这个问题。在分布式系统中进行资源检索时,可以基于MapReduce架构来实现检索,这样,检索操作的压力将分散到分布式系统的各个节点中,这样可以有效降低机器的压力,大大提高检索的效率。采用传统方式检索100万条数据,需要耗时500 s,而采用基于MapReduce架构的分布式系统的方法来检索100万的数据,只需要花费40 s,相对于传统检索方法采用基于MapReduce架构的分布式系统检索可使检索效率提升接近12.5倍。 In the document retrieval method,the key is built on the database search. However,when the amount of data to be retrieved becomes very large,using this search method,a large number of retrieval operations will be concentrated on a single host,which can result in reduced efficiency of retrieval. Under this background,a distributed system can be used to solve the problem. Retrieving resources in a distributed system can be based on MapReduce architecture to achieve retrieval. Thus,the pressure of retrieval operation will be allocated to each node in a distributed system,which can effectively reduce the pressure of the machine and greatly improve the retrieval efficiency. Using the traditional way,retrieving 1 million data consumes 500 seconds,while using the method based on MapReduce architecture for distributed systems to retrieve one million data only needs 40 seconds. Compared with traditional search method,method of distributed systems based on MapReduce architecture can promote efficiency to 12. 5 times.
出处 《河池学院学报》 2016年第2期101-105,共5页 Journal of Hechi University
基金 广西高校科学技术研究项目(LX2014320) CALIS广西壮族自治区文献信息服务中心预研项目(LALISGX2014006)
关键词 大数据 MAPREDUCE 检索 分布式系统 big data MapReduce searching distributed system
  • 相关文献

参考文献9

二级参考文献56

  • 1Zhang Liangjie, Zhou Qun. CCOA: cloud computing open--architecture[C]//IEEE International Conference on Web Services. Los Angeles, CA: Press IEEE Com- puter Society, 2009: 608-612.
  • 2Dean J, Ghemawat S. MapReduce: simplified data pro- cessing on large elusters[C]//Proe 6th Syrup on Oper- ating System Design and Implementation, New York, ACM Press, 2004 : 137- 150.
  • 3Owen S,Anil R,Dunning T,et al.Mahout in action[M].[S.l.].Manning Publications ,2011.
  • 4Chu C T, Kim S K, Lin Y A,et al.Map-reduce for machinelearning on multicore[J] .Advances in Neural InformationProcessing Systems,2007,19.
  • 5Ghemawat S, Gobioff H, Leung S T.The Google file system[C]//SOSP,03,2003.
  • 6Dean J, Ghemawat S.MapReduce: simplified data processingon large clusters[J].Communications of the ACM, 2008,51(1).
  • 7Chang F, Dean J, Ghemawat S, et al.Bigtable: a distributedstorage system for structured data[J].ACM Transactions onComputer Systems (TOCS) ,2008,26(2).
  • 8White T.Hadoop: the definitive guide[M].[S.l.] : Yahoo Press,2010.
  • 9Han J, Kamber M, Pei J.Data mining: concepts and tech-niques[M].[S.l.] :Morgan Kaufmann,2011.
  • 10Huang Z.Extensions to the 灸-means algorithm for cluster-ing large data sets with categorical values[J].Data Miningand Knowledge Discovery, 1998,2(3) :283-304.

共引文献162

同被引文献28

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部