摘要
文章针对HDFS的数据写入性能不高的问题提出了一种基于调度树的数据流并发传输的调度算法。该算法将文件系统的节点分为转发节点和叶子节点,按流量匹配原则将节点编排为调度树后进行副本分发,让所有节点的网卡并发传输,磁盘并发写入,减小了副本写入分布式文件系统的时间。性能测试表明,使用该算法的分布式文件系统的写入性能较原始的HDFS更高。
This article proposes a scheduling algorithm for concurrent transmission of data flow based on scheduling tree with an aim to improve the performance of HDFS.The algorithm divides the nodes of the file system into forwarding nodes and leaf nodes,distributes the duplicates by arranging the nodes into the scheduling tree according to the principle of information flow matching,and realize the concurrent transmission of adapters and the concurrent disk input which shortens the time used for writing the duplicates into the distributed files.The performance tests shows that the distributed file system of this algorithm is better than the original HDFS with regard to disk input.
作者
高原
顾文杰
彭晖
陈鹏
季学纯
Gao Yuan Gu Wenjie PengHui Chen Peng Ji Xuechun(NARI Group Corporation/State Grid Electric Power Research Institute, Nanjing 211106, China NARI Technology Development Co., Ltd., Nanjing 211106, China State Key Laboratory of Smart Grid Protection and Control, Nanjing 211106, China)
出处
《江苏科技信息》
2017年第27期40-42,45,共4页
Jiangsu Science and Technology Information
关键词
分布式文件系统
数据流
并发
调度
distributed file system
data flow
concurrent
scheduling