摘要
运行在Hadoop上的数据仓库Hive可以让更多的用户通过SQL接口来处理Hadoop数据。然而,Hive却没有为连接操作提供有效的途径,而连接操作是一种常见且在Hadoop中非常费时的操作。为了解决连接操作在Hadoop中性能的问题,本文提出一种混合策略的连接算法HJ-A,根据当前应用场景在几种连接算法之间选择相对较合适的算法,实验结果表明,HJ-A可以在大多数的Hadoop场景中发挥很好的性能。
Hive, the database on Hadoop, enables more users to process relation data by providing sql-like interface. However, Hive does not provide an efficient approach for join, a common but expensive operator in Hadoop. In order to solve the perform-ance of join, this paper proposes a novel hybrid algorithm, HJ-A, which can help to automatically choose the relatively better one among several methods, according to the current situation. Experiments results show that HJ-A can get best performance in most situations.
出处
《计算机与现代化》
2015年第6期86-91,共6页
Computer and Modernization