摘要
Hadoop集群环境中,由于增加删除节点、删除文件等操作,都有可能造成数据的不均衡。数据的负载均衡对集群性能有着重要的影响。首先分析现有的负载均衡算法,然后提出一种基于异构集群性能和剩余空间的负载均衡算法。该算法根据节点的剩余空间以及节点性能来计算各个节点的理论空间利用率,并且根据集群的存储空间利用率来动态的调整节点的最大负载率。实验结果显示,提出的数据负载均衡算法可以使异构环境下的集群基于性能和剩余空间达到所期望的均衡状态:性能越高、剩余空间越大的节点,应该有更高的空间利用率。
Hadoop cluster environment, due to increase delete nodes, delete files and other operations, may result in data imbalance. Data load balanc- ing has a significant impact on cluster performance. First analyzes the existing load balancing algorithm, and then proposes a load balanc- ing algorithm based on the performance of the heterogeneous cluster and the remaining space. The algorithm calculates the theoretical space utilization of each node according to the remaining space of the node and the node performance, and dynamically adjusts the maxi- mum load factor of the node according to the storage utilization of the cluster. The experimental results show that the proposed data load bal - ancing algorithm can make the cluster under heterogeneous environment reach the equilibrium we expect based on the performance and the remaining space: the higher the performance, the larger the remaining space, there should be more space utilization.
作者
陈林
CHEN Lin(College of Computer Science, Sichuan University, Chengdu 610065)