期刊文献+

面向AI数据流处理的边缘GPU集群通信系统 被引量:2

Edge GPU Cluster Communication System for AI Data Flow Processing
下载PDF
导出
摘要 在边缘计算场景中,GPU集群需要应对终端设备所产生的数量庞大的AI计算任务.AI计算任务在边缘GPU集群内的响应耗时不仅包括计算时间,还包括数据传输和排队等待延时.因此,任务数据传输和AI数据流调度也是影响GPU集群数据处理性能的关键因素.传统网络协议栈的低效率和专用高速网络设备的高成本,并不适用于边缘场景中大规模AI数据流的实时处理.本文基于DPDK技术提出多核多网卡的并行通信机制,利用集群空闲的CPU资源加快数据传输;兼顾节点计算能力和网络负载分析节点实时处理能力制定数据流分配策略,并实现了由数据接入量驱动的动态多核多缓冲区模型,减少了任务计算的等待时间.实验结果表明,提出的通信调度方案不仅能够增加约30%的集群数据流容量,而且带宽利用率能够达到90%;在总AI任务量相同的情况下,归功于DPDK高效的数据包处理能力,避免了大量的AI任务因传输失败而被丢弃的情况. In the edge computing scenario,GPU clusters need to deal with a large number of AI computing tasks generated by terminal devices.The response time of the AI computing task in the edge GPU cluster includes not only computing time,but also data transmission and queuing delay.Therefore,task data transmission and AI data flow scheduling are also key factors affecting the performance of GPU cluster data processing.Due to the low efficiency of the traditional network protocol stack and the high cost of the dedicated high-speed network equipment,it is not suitable for the real-time processing of large-scale AI data flow in the edge scenario.Based on DPDK technology,this paper proposes a parallel communication mechanism of multi-core and multi-NIC,which uses idle CPU resources of cluster to speed up data transmission.Considering the computing capacity and network load of nodes,the real-time processing capacity of nodes is analyzed to make the data flow allocation strategy,and the dynamic multi-core multi-buffer model driven by data access volume is implemented to reduce the waiting time of task calculation.The experimental results show that the proposed communication scheduling scheme can not only increase the capacity of cluster data flow by about 30%,but also increase the bandwidth utilization to 90%;thanks to the DPDK′s efficient packet processing capability,a large number of AI tasks were not discarded due to transmission failure when the total AI tasks were the same.
作者 涂聪 陈庆奎 TU Cong;CHEN Qing-kui(School of Optical-electrical Computer Engineering,University of Shanghai for Science&Technology,Shanghai 200093,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2022年第6期1147-1153,共7页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61572325)资助 上海重点科技攻关项目(19DZ1208903)资助 上海智能家居大规模物联共性技术工程中心项目(GCZX14014)资助.
关键词 AI数据流 边缘GPU集群 DPDK 数据流分配 动态多核多缓冲区 AI data flow edge GPU cluster DPDK data flow distribution dynamic multi-core and multi-buffer
  • 相关文献

参考文献4

二级参考文献7

共引文献35

同被引文献15

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部