摘要
为解决传统关系数据库存储QAR数据可扩展性低、可用性差的问题,设计一种基于HBase的QAR数据分布式存储方法。根据QAR数据的特点,设计HBase表结构,将QAR参数划分为安全、航迹、燃油、发动机、预测、飞行员操作及其它共七大主题,构建基于航班号、航班日期、参数主题三者组合的MD5散列值行键结构,根据行键散列值对QAR数据值表预分区,通过行键散列机制和预分区技术相结合的两级优化策略实现QAR数据文件分布式存储。真实QAR数据集上的实验结果表明,该QAR数据存储模式能使数据均衡分布在集群中,避免了写热点和数据倾斜问题,有较高的存取性能。
To solve the problems of low scalability and poor availability that the storage of QAR in traditional relational database has,a distributed storage method of QAR data based on HBase database was proposed.According to the characteristics of QAR data,the structure of HBase table was designed,and QAR parameters were divided into seven topics including safety,track,fuel,engine,prediction,pilot operation and other.The rowkey structure containing MD5 hash was designed based on flight number,flight date and parameters topic.The value table of QAR data was pre-partitioned according to the hash value of the value table.The distributed storage of QAR data was realized using a two-level optimization strategy combining the rowkey hashing mechanism and the pre-partitioning technique.Results of experiments on real QAR datasets show that the QAR data sto-rage mode designed can store data evenly in HBase clusters,avoid write hotspots and data skew problems effectively,and has high access performance.
作者
霍纬纲
程文莉
李继龙
HUO Wei-gang;CHENG Wen-li;LI Ji-long(College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China)
出处
《计算机工程与设计》
北大核心
2020年第5期1494-F0003,共8页
Computer Engineering and Design
基金
国家自然科学基金委员会-中国民用航空局民航联合研究基金项目(U1633110)
国家自然科学基金项目(61301245)
中央高校基本科研业务费项目中国民航大学专项基金项目(3122019190)。