摘要
针对不断增长的数据密集型深度学习应用需求(如图像识别)在没有GPU、FPGA等专有设备支持下无法得到满足,阐述基于硬件卸载和数据流架构,提出一种处理模型,将深度学习计算任务卸载到具有众核结构的智能网卡上,使数据绕过CPU和操作系统内核实现在网计算。通过对智能网卡计算资源的划分和深度学习模型的分解,验证了深度学习模型在众核结构智能网卡这一低成本通用设备上的可迁移性,对AlexNet神经网络有效结构在Agilio网卡上的迁移,实现数据密集型深度学习应用的在网计算,借助流水线的设计思想来提高众核结构智能网卡上数据在网计算的吞吐性能和并行性。实验表明,图像数据在该系统中达到高吞吐性能的同时具有微秒级处理延迟。
In view of the growing demand for data intensive deep learning applications(such as image recognition) that cannot be met without the support of GPU, FPGA and other proprietary devices, this paper expounds the architecture based on hardware unloading and data flow, and proposes a processing model to unload the deep learning computing tasks to the intelligent network card with multi-core structure, so that the data can bypass the CPU and operating system kernel to realize online computing. By dividing the computing resources of the intelligent network card and decomposing the deep learning model, the portability of the deep learning model on the multi-core intelligent network card, which is a low-cost universal device, is verified, and the migration of the effective structure of the Alex Net neural network on the Agilio network card is realized to realize the on-line computing of the data intensive deep learning application, The pipeline design idea is used to improve the throughput and parallelism of data on the multi-core intelligent network card. The experiment shows that the image data has high throughput and microsecond processing delay.
作者
沈硕
邢凯
SHEN Shuo;XING Kai(School of Software Engineering,University of Science and Technology of China,Jiangsu 215123,China;School of Computer Science and Technology,University of Science and Technology of China,Anhui 230027,China)
出处
《电子技术(上海)》
2022年第8期28-33,共6页
Electronic Technology
基金
国家自然科学基金项目(NSFC 61332004)。
关键词
智能网卡
在网计算
低延迟
神经网络
intelligent network card
in network computing
low latency
neural network