期刊文献+

基于Hadoop生态系统的大数据解决方案综述 被引量:117

Reviewing the big data solution based on Hadoop ecosystem
下载PDF
导出
摘要 一个大数据解决方案需要面对三个关键问题:大数据的存储、大数据的分析和大数据的管理。首先综述了大数据和Hadoop生态系统的定义;然后从商业产品和Hadoop生态系统两个方面来探讨如何面对大数据,重点分析了Hadoop生态系统是如何解决的:分别用HDFS、HBase和OpenTSDB解决存储问题,用Hadoop MapReduce(Hive)和HadoopDB解决分析问题,用Sqoop和Ganglia等解决管理问题。对于每个成员,分别分析了其系统架构、实现原理和特点;对于重点成员,分别分析了其存在的一些问题或缺点,并在总结当前学术和应用的进展基础上,结合我们自身的研究进展,提出了解决方法、解决思路和观点。可以预见,Hadoop生态系统将是中小企业在面对大数据问题时的首选解决方案。 Solving big data must deal with three crucial problems:big data storage,big data analysis and big data management.Firstly,the definitions of big data and Hadoop ecosystem are summarized respectively.Secondly,how to face big data is discussed from the two aspects of commercial products and Hadoop ecosystem.The paper focuses on reviewing the big data solution based on Hadoop ecosystem:(1) HDFS,Hbase and OpenTSDB are used to deal with storage problems; (2) Hadoop MapReduce (Hive) and HadoopDB do analytical problems; and (3) Sqoop and Ganglia solve management problems.For each partner,its architecture,principles and features are analyzed.And for some defects or problems existing in some key partners,we propose some solutions,ideas and viewpoints based on our research progress.It is predicted that Hadoop ecosystem is the preferable solution for the small and medium-sized enterprises.
出处 《计算机工程与科学》 CSCD 北大核心 2013年第10期25-35,共11页 Computer Engineering & Science
基金 国家核高基项目(2010ZX01042-001-003)
关键词 大数据 Hadoop生态系统 MAPREDUCE HDFS 列存储数据库 big data Hadoop ecosystem MapReduce HDFS column-oriented database
  • 相关文献

参考文献3

二级参考文献30

  • 1Vaquero L M, Rodero-Merino L, Caceres J, et al. A Break in the Clouds: Towards a Cloud DefinitionD]. ACM SIGCOMM Computer Communication Review, 2009, 39 ( 1 ) : 50- 55.
  • 2Bryant R E. Data-Intensive Supercomputing: the Case for DISC[R]. CMU Technical Report CMU-CS-07-128, Department of Computer Science, Carnegie Mellon University, 2007.
  • 3Dean J, Ghemawat S. MapReduce: Simplied Data Processing on Large Clusters[C]//Proc of OSDI '04,2004 : 137-150.
  • 4Colbyranger, Raghuraman R, Penmetsa A. Evaluating MapReduce for Multi-Core and Multiprocessor Systems[C]//Proc of the IEEE 13th Int'l Syrup on High Performance Computer Architecture, 2007 : 13-24.
  • 5Kruijf M D, Sankaralingam K. MapReduce for the Cell B. E. Architecture[-R]. Technical Report CS-TR-2007-1625, University of Wisconsin Computer Sciences University of Wisconsin, 2007.
  • 6He B S, Fang W B, Luo Q, et al. Mars: A MapReduce Framework on Graphics Processors[C]//Proc of the 17th Int'l Conf on Parallel Architectures and Compilation Techniques, 2008 : 260-269.
  • 7Apache Hadoop. Hadoop [EB/OL]. [2009-03-06]. http://hadoop, apache, org/.
  • 8Yahoo. Yahoo! Hadoop Tutorial [EB/OL]. [2009-02-27]. http:// public, yahoo, com/gogate/hadoop-tutorial/start-tutorial, html.
  • 9Ghemawat S, Gogioff H, Leung P T. The Google File System[C]//Proc of the 19th ACM Syrnp on Operating Systems Principles, 2003 : 29-43.
  • 10Zaharia M, Konwinski A, Joseph A D. Improving MapReduce Performance in Heterogeneous Environments [C]//Proc of the 8th Usenix Syrup on Operating Systems Design and Implementation, 2008 : 29-42.

共引文献60

同被引文献812

引证文献117

二级引证文献659

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部