期刊文献+

基于HBase和Hive的航班延误平台的存储方法 被引量:6

Storage method for flight delay platform based on HBase and Hive
下载PDF
导出
摘要 针对我国目前航班延误平台的移植难、可扩展性差,无法适应民航高速发展所带来的大数据量存储的现状,设计了面向大数据的跨平台、高适用性与高扩展性的航班延误平台。该平台以大数据工具Leaf Let为可视化载体,在地图界面实时显示航班轨迹并将轨迹数据加载至HBase数据库中,并且利用信息摘要算法(MD5)重新设计与优化航班数据表的行键,以解决其递增的飞行时间特性产生的"热点"问题;针对HBase过滤器多级查询的缺陷,提出了基于Solr Cloud的关联查询算法,利用Solr Cloud实现对行键与索引字段的分层存储,从而实现HBase二级快速索引;最后在HBase的历史航班数据与飞行计划数据基础上,构建基于Hive的海量航班信息数据仓库。实验结果显示,航班延误大数据平台的可扩展性与搭建的航班信息数据仓库可以满足民航对数据集中统一存储的需求,而多条件查询的响应速度与无二级索引的集群相比提高了上百倍,并且这种优势随着航班数据量的增长愈发明显。 In the view of the problem that the portability and expansibility current flight delay platform in China can not adapt to the status of large data storage brought by rapid development of Chinese civil aviation, a flight delay big data platform with cross platform, high availability and high expansion was designed. The platform used a big data tool Leaf Let as a visual carrier, displayed the flight trajectory in the map interface, and loaded trajectory data to HBase database. Message-Digest Algorithm( MD5) algorithm was used to redesign and optimize the rowkey of flight data table to solve its "hot spot" problem brought by incremental flight time. Considering the shortcomings of multi-level query of HBase filter, a query algorithm based on Solr Cloud was proposed, which utilized Solr Cloud to realize hierarchical storage of row and index fields, so as to realize HBase two-level fast indexing. Finally, based on historical flight data and flight plan data of HBase, a massive flight information data warehouse based on Hive was constructed. The experimental results show that the expensibility of large data platform for flight delays and the construction of flight information data warehouse can meet the demand of civil aviation for unified storage of data, and the response speed of the multi-condition query is improved by hundreds of times compared with the cluster without second index, and this advantage becomes more and more obvious as the flight data amount grows.
作者 吴仁彪 刘超 屈景怡 WU Renbiao;LIU Chao;QU Jingyi(Tianjin Key Laboratory for Advanced Signal Processing,Civil Aviation University of China,Tianjin 300300,Chin)
出处 《计算机应用》 CSCD 北大核心 2018年第5期1339-1345,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(11402294) 天津市智能信号与图像处理重点实验室开放基金资助项目(2017ASP-TJ01)~~
关键词 大数据平台 航班延误 HBASE Hive SolrCloud LEAFLET big data platform flight delay HBase Hive SolrCloud LeafLet
  • 相关文献

参考文献6

二级参考文献49

  • 1李盛恩,王珊.封闭数据立方体技术研究[J].软件学报,2004,15(8):1165-1171. 被引量:25
  • 2DeWitt D J, Madden S, Stonebraker M. How to Build a High Performance Data Warehouse[EB/OL]. (2008-01-01). http://db.lcs. mit.edu/madden/high_perf.pdf.
  • 3Dehne F, Rau-Chaplin A, Eavis T. The PANDA Project[EB/OL]. [2008-11-13]. http://projects.cs.dal.ca/panda/.
  • 4Ghemawat S, Gobioff H, Leung S T. The Google File System[C]// Proc. of the 19th Symposium on Operating Systems Principles. [S.I.]: ACM Press, 2003.
  • 5Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters[C]//Proc. of the 6th Symposium on Operating Systems Design and Implementation. San Francisco, CA, USA: [s. n.], 2004.
  • 6Chang F, Dean J, Ghemawat S, et al. BigTable: A Distributed Storage System for Structured Data[C]//Proc. of the 7th Symposium on Operating Systems Design and Implementation. Seattle, WA, USA: [s. n.], 2006.
  • 7EMC Corporation. Groundbreaking Study Forecasts a Staggering 988 Billion Gigabytes of Digital Information Created in 2010[EB/OL]. (2007-03-06). http://www.emc.com/about/news/ press/us/2007/03062007-4932.htm.
  • 8Apache Hadoop Org.. Hadoop[EB/OL]. (2011-02-11). http://had oop.apache.org.
  • 9Nosql-database Org.. NOSQL Databases[EB/OL]. (2011-02-10). http://nosql-database.org/.
  • 10Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters[C]//Proc. of the 6th Symposium on Operating Systems Design and Implementation. San Francisco, USA: [s. n.], 2004.

共引文献50

同被引文献57

引证文献6

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部