期刊文献+

Design and development of real-time query platform for big data based on hadoop 被引量:1

Design and development of real-time query platform for big data based on hadoop
下载PDF
导出
摘要 This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability. This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database (HBase). This platform consists of four layers including ETL (extraction transformation loading) tier, data processing tier, data storage tier and data display tier, achieving long-term store, real-time analysis and inquiry for massive data. Fi- nally, a real dataset cluster is simulated, which are made up of 39 nodes including 2 master nodes and 37 data nodes, and performing function tests of data importing module and real-time query mod- ule, and performance tests of HDFS' s I/O, the MapReduce cluster, batch-loading and real-time query of massive data. The test results indicate that this platform achieves high performance in terms of response time and linear scalability.
出处 《High Technology Letters》 EI CAS 2015年第2期231-238,共8页 高技术通讯(英文版)
基金 Supported by the National Science and Technology Support Project(No.2012BAH01F02)from Ministry of Science and Technology of China the Director Fund(No.IS201116002)from Institute of Seismology,CEA
关键词 实时查询 平台框架 开发 设计 数据输入模块 数据存储层 功能测试 性能试验 big data, massive data storage, real-time query, Hadoop, distributed computing
  • 相关文献

参考文献13

  • 1Che D, Safran M, Peng Z. From Big Data to Big Data Mining: Challenges, Issues, and Opportunities. In : Da- tabase Systems for Advanced Applications, Springer Ber- lin Heidelberg, 2013. 1-15.
  • 2Li W, Wang W, Jin T. Evaluating Spatial Keyword Que- ries Under the Mapreduce Framework. In: Database Sys- tems for Advanced Applications, Springer Berlin Heidel- berg, 2012. 251-261.
  • 3Han D, Stroulia E. HGrid: A data model for large geo- spatial data sets in Hbase. In: Proceedings of the 2013 IEEE Sixth International Conference on Cloud Compu- ting, 2013. 910-917.
  • 4Madden S. From databases to big data. IEEE Internet Computing, 2012, 16(3) : 4-6.
  • 5Nishimura S, Das S, Agrawal D, et al. MD-HBase: A scalable multi-dimensional data infrastructure for location aware services. In: Proceedings of the 12th IEEE Inter- national Conference on Mobile Data Management, Lulea, Sweden, 2011. 7-16.
  • 6Apache Hadoop. http ://hadoop. apache, org/core/.
  • 7Madden S. From databases to big data. IEEE Internet Computing, 2012, 16(3) : 4-6.
  • 8White T. Hadoop: The Definitive Guide. 3nd Edition Publisher. O' Reilly Med~a/Yahoo Press, 2012. 67-72.
  • 9Armbrust M, Fox A. Above the clouds: A berkeley view of cloud computing: [ Technical Report] , No. UCB/EE- CS-2009-28, University of California at Berkley, 2009.
  • 10HBase: Bigtable-like structured storage for Hadoop HDFS, 2010, http://hadoop, apache, org/hbase/.

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部