摘要
随着大数据的发展,Hadoop系统成为了大数据处理中的重要工具之一。在实际应用中,Hadoop的I/O操作制约系统性能的提升。通常Hadoop系统通过软件压缩数据来减少I/O操作,但是软件压缩速度较慢,因此使用硬件压缩加速器来替换软件压缩。Hadoop运行在Java虚拟机上,无法直接调用底层I/O硬件压缩加速器。通过实现Hadoop压缩器/解压缩器类和设计C++动态链接库来解决从Hadoop系统中获得压缩数据和将数据流向I/O硬件压缩加速器两个关键技术,从而将I/O硬件压缩加速器集成到Hadoop系统框架。实验结果表明,I/O硬件压缩加速器的每赫兹压缩速度为15.9Byte/s/Hz,集成I/O硬件压缩加速器提升Hadoop系统性能2倍。
With the development of big data, Hadoop systems become an important tool, but I/O operations impede their performance improvement in practical applications. Hadoop usually decreases its' I/O operations by using software to compress data. However, data compression by software is slower than hardware accelerators. When Hadoop runs on Java virtual machines, it cannot directly call I/O hardware accelerators. To avoid getting data from the Hadoop system and transferring the data to I/O hardware accelerators, a compressor and decompressor class of Hadoop and a C++ dynamic linking library are employed in the Hadoop system. Experimental results show that both techniques can integrate I/O hardware accelerators into the Hadoop system frame work, the efficiency of I/O hardware compressor is 15.9Byte/s/Hz, and the performance of the Hadoop system can be improved by two times.
出处
《计算机工程与科学》
CSCD
北大核心
2016年第8期1524-1529,共6页
Computer Engineering & Science
基金
华为技术有限公司资助项目(YB2014100047)