摘要
一个大数据解决方案需要面对三个关键问题:大数据的存储、大数据的分析和大数据的管理。首先综述了大数据和Hadoop生态系统的定义;然后从商业产品和Hadoop生态系统两个方面来探讨如何面对大数据,重点分析了Hadoop生态系统是如何解决的:分别用HDFS、HBase和OpenTSDB解决存储问题,用Hadoop MapReduce(Hive)和HadoopDB解决分析问题,用Sqoop和Ganglia等解决管理问题。对于每个成员,分别分析了其系统架构、实现原理和特点;对于重点成员,分别分析了其存在的一些问题或缺点,并在总结当前学术和应用的进展基础上,结合我们自身的研究进展,提出了解决方法、解决思路和观点。可以预见,Hadoop生态系统将是中小企业在面对大数据问题时的首选解决方案。
Solving big data must deal with three crucial problems:big data storage,big data analysis and big data management.Firstly,the definitions of big data and Hadoop ecosystem are summarized respectively.Secondly,how to face big data is discussed from the two aspects of commercial products and Hadoop ecosystem.The paper focuses on reviewing the big data solution based on Hadoop ecosystem:(1) HDFS,Hbase and OpenTSDB are used to deal with storage problems; (2) Hadoop MapReduce (Hive) and HadoopDB do analytical problems; and (3) Sqoop and Ganglia solve management problems.For each partner,its architecture,principles and features are analyzed.And for some defects or problems existing in some key partners,we propose some solutions,ideas and viewpoints based on our research progress.It is predicted that Hadoop ecosystem is the preferable solution for the small and medium-sized enterprises.
出处
《计算机工程与科学》
CSCD
北大核心
2013年第10期25-35,共11页
Computer Engineering & Science
基金
国家核高基项目(2010ZX01042-001-003)