期刊文献+

基于Hadoop的大数据计算技术 被引量:18

Hadoop-Based Big Data Computing Technologies
原文传递
导出
摘要 大数据计算面对的是传统IT技术无法处理的数据量超大规模、服务请求高吞吐量和和数据类型异质多样的挑战。得益于国内外各大互联网公司的实际应用和开源代码贡献,源于Google的Apache Hadoop软件已成为PB量级大数据处理的成熟技术和事实标准。本文介绍了大数据计算系统中存储和索引两项研究工作,RCFile和CCIndex,分别有效解决了大数据计算系统的存储空间问题和查询性能问题。 Volume, Variety and Velocity are the three challenges must faced for the big data computing, which cannot be dealt with by traditional IT technologies. Benefit from numerous Internet companies' practical applications and continuous code contribution, the Apache Hadoop software, that was stemed from google's GFS and MapReduce, has become a mature software stack and the de facto standard of PB scale data processing. This paper introduces structuring data storage and index construction research of big data computing system, RCFile and CCIndex respectively, which are effective solutions to storage space and query performance issues.
作者 查礼
出处 《科研信息化技术与应用》 2012年第6期26-33,共8页 E-science Technology & Application
基金 国家高技术研究发展计划(863计划)(2011AA01A203)
关键词 大数据 HADOOP 行列混合式数据存储 互补式聚簇索引 云计算 Big Data Hadoop RCFile Complementary clustering index Cloud computing
  • 相关文献

参考文献8

  • 1Jeffrey Dean and Sanjay Ghemawat.. MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, vol. 51, no. 1 (2008), pp. 107-113.
  • 2Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System, in Proceedings of the 19th ACM Symposium on Operating Systems Principles, ACM, Bolton Landing, NY (2003), pp. 20-43.
  • 3Fay Chang, Jeffrey Dean, Sanjay Ghemawat, wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A Distributed Storage System for Structured Data, In Proceedings of OSDI 2006, pp. 205-218, Seattle, WA, 2006.
  • 4Apache Hadoop. http://hadoop.apache.org/.
  • 5Apache Hive. http://hive.apache.org/.
  • 6Apache HBase. http://hbase.apache.org/.
  • 7Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, NamitJain, Xiaodong Zhang, and Zhiwei Xu. "RCFile: a fast and space-efficient data placement structure in MapReduce- based warehouse systems", in Proceedings of International Conference on Data Engineering (ICDE 2011), pp. 1199- 1208, Hannover, Germany, 2011.
  • 8Yongqiang Zou, Jia Liu, Shicai Wang, Li Zha, Zhiwei Xu. CCIndex a complemental clustering index on distributed ordered tables for multi-dimensional range queries, in Proceedings of the 2010 IFIP international conference on Network and parallel computing, pp. 247-261, Zhengzhou, China, 2010.

同被引文献90

引证文献18

二级引证文献78

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部