摘要
神经网络参数量和运算量的扩大,使得在资源有限的硬件平台上流水线部署神经网络变得更加困难。基于此,提出了一种解决深度学习模型在小型边缘计算平台上部署困难的方法。该方法基于应用于自定义数据集的深度可分离网络模型,在软件端使用迁移学习、敏感度分析和剪枝量化的步骤进行模型压缩,在硬件端分析并设计了适用于有限资源FPGA的流水线硬件加速器。实验结果表明,经过软件端的网络压缩优化,这种量化部署模型具有94.60%的高准确率,16.64 M的较低的单次推理定点数运算量和0.079 M的参数量。此外,经过硬件资源优化后,在国产FPGA开发板上进行流水线部署,推理帧率达到了366 FPS,计算能效为8.57 GOPS/W。这一研究提供了一种在小型边缘计算平台上高性能部署深度学习模型的解决方案。
The parameter and computational requirements of neural networks have increased,making it increasingly difficult to deploy neural networks on hardware platforms with limited resources.This paper proposed a method to address the challenge of deploying deep learning models on small edge computing platforms.The method utilized a depthwise separable network model applied to a custom dataset.This method carried out model compression on the software end by employing steps as transfer learning,sensitivity analysis,and pruning quantization.On the hardware end,it analyzed and designed a pipeline hardware accelerator suitable for FPGA with limited resources.Experimental results demonstrate that after software-based network compression optimization,this quantized deployment model achieves a high accuracy rate of 94.60%,with a lower single-inference fixed-point operation count of 16.64 M and a parameter count of 0.079 M.Furthermore,after hardware resource optimization,the pipeline deployment on a domestic FPGA development board achieved an inference frame rate of 366 FPS and a computational efficiency of 8.57 GOPS/W.This research provides a solution for high-performance deployment of deep learning models on small-scale edge computing platforms.
作者
孟群康
李强
赵峰
庄莉
王秋琳
陈锴
罗军
常胜
Meng Qunkang;Li Qiang;Zhao Feng;Zhuang Li;Wang Qiulin;Chen Kai;Luo Jun;Chang Sheng(School of Physics and Technology,Wuhan University,Wuhan 430072,China;State Grid Information&Telecommunication Co.,Ltd.,Beijing 102211,China;Fujian Yirong Information Technology Co.,Ltd.,Fuzhou 350003,China;Institute of Electronic Fifth Research Dept.,Ministry of Industry and Information Technology,Guangzhou 510507,China)
出处
《计算机应用研究》
CSCD
北大核心
2024年第3期861-865,879,共6页
Application Research of Computers
基金
国家自然科学基金资助项目(62074116,61874079)
广东省基础与应用基础研究基金资助项目(2021A1515110939)
武汉大学珞珈青年学者基金资助项目
电网人工智能模型优化研究项目(SGITYLYRWZXX2202264)
武汉市知识创新专项资助项目(2023010201010077)。
关键词
边缘计算
深度可分离卷积
流水线部署
硬件加速器
FPGA
edge computing
depthwise separable convolution
pipelined deployment
hardware accelerator
FPGA