摘要
为了研究网络架构和通讯协议对大数据处理与应用系统效率的影响,在介绍与分析不同网络架构和通讯协议的基础上,使用以太网和Infiniband网络连接,安装TCP/IP,IPoIB和RDMA协议,构建了Hadoop,Tachyon和Spark原型系统,使用通用测试工具和例程等进行了测试.测试结果表明相比使用TCP/IP,使用IPoIB能提高Hadoop的I/O性能4.6-5.6倍,减少Tachyon数据处理的时间开销达2%-27%,减少了Spark 90%-95%的时间开销,同时Spark性能提高了46倍.此外使用RDMA相比IPoIB减少了3%-15%的系统开销.最终分析得出,高速网络架构和高效通讯协议能有效提高大数据处理系统的I/O性能、处理效率和适应能力.
To investigate the effects of network architecture and communication protocol on large data processing and application system efficiency,the introduction and the analysis of different network architecture and communication protocol were given. The Ethernet,Infiniband,TCP / IP,IPoIB and RDMA protocols were used to construct the prototypes of Hadoop,Tachyon and Spark. Some common test tools and applications were used to evaluate the performance of prototypes. The test results show that compared to TCP / IP protocol,the I / O performance of Hadoop can be improved by 4. 6 to 5. 6 times with IPoIB protocol,and the time overhead of Tachyon data processing can be reduced up to 2% ~ 27% and90% ~ 95% for spark. The performance of Spark is improved by 46 times. Compared to IPoIB,the system overhead can be decreased by 3% ~ 15% by RDMA protocol. The high-speed network architecture and efficient communication protocol can effectively improve I / O performance,efficiency and adaptability of big data system.
出处
《江苏大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2016年第4期429-437,共9页
Journal of Jiangsu University:Natural Science Edition
基金
国家自然科学基金资助项目(61300228)
江苏省自然科学基金资助项目(BK20140570)
浙江省自然科学基金资助项目(LY13F020012)
江苏省科技厅重点研发计划产业前瞻与共性关键技术项目(BE2015137)
江苏省科技支撑计划项目(BE2013103)
深圳市科技项目(JCYJ20130401095947222)