摘要
以大数据的查询技术为中心,研究了当前一些主流的查询方法以及在此基础上的优化改进。MapReduce是一种编程模型,将存储在HDFS中的文件分块再整合以达到加速实现数据查询的目的,在此方法的基础上优化得出Map-Trim-Reduce编程模型,然后与Impala查询引擎相结合,利用M印-Trim-Reduce处理复杂数据的长处弥补Impala的短处,提前处理Impala的预处理数据,达到提高大数据查询效率的目的。
This paper takes the large data query technology as the center,and researches some main current query methods and the optimization based on them.MapReduce is a programming model,which integrates the file blocks stored in the HDFS to achieve the purpose of accelerating the realization of data query.Based on this method,an improved Map-Trim-Reduce programming model is obtained,and then it is combined with the Impala query engine.Use Map-Trim-Reduce to deal with the advantages of complex data to make up for the shortcomings of Impala,and deal with the Impala preprocessing data,so as to improve the efficiency of large data query.
出处
《微型电脑应用》
2016年第6期29-31,共3页
Microcomputer Applications
基金
中国石油科技创新基金研究项目(2013D-5006-0203)
黑龙江省科技攻关项目(GZ09A120)
黑龙江省教育厅科学技术研究项目(12521050)