摘要
针对将Hadoop迁移到虚拟化环境中不能充分发挥其性能优势这一问题,设计一种共享式存储设备StoreApp。StoreApp主机将映射任务生成的中间数据放到存储节点中,实现存储与计算相分离;设计HDFS预取算法来处理数据读取失准现象;StoreApp采用一种集群规模自动调整技术确定不同作业的最优集群规模,通过动态改变每个主机上计算节点的数量实现作业完成时间的最小化。仿真结果表明,与传统的未将计算和存储相分离的Hadoop方案和典型的Themis方案相比,StoreApp可显著提升HDFS吞吐量并降低作业完成时间。
Hadoop platform can not give full play to its performance advantages when Hadoop is only migrated to virtual environment.To solve the problem,a shared memory device(StoreApp)was designed for the Hadoop virtual working node deployed on the same physical host.The StoreApp host pushed intermediate data generated by the mapping task to the storage node for separating storage and computing phase.The HDFS prefetching algorithm was designed to deal with the unaligned read phenomenon.A cluster size automatic adjustment technology was used to determine the optimal cluster size of different operations,through dynamically changing the number of computing nodes on each host to minimize the job completion time.Simulation results show that,compared with the typical Themis scheme and the traditional Hadoop scheme which does not separate computation and storage,StoreApp can significantly improve HDFS throughput and reduce job completion time.
作者
覃伟荣
QIN Wei-rong(College of Resources and Environment, Qinzhou University, Qinzhou 535011, China)
出处
《计算机工程与设计》
北大核心
2018年第5期1319-1325,共7页
Computer Engineering and Design
基金
广西教育厅科研基金项目(KY2015YB314)