摘要
为了降低在边缘计算端部署YOLO网络的功耗和硬件资源消耗,基于现场可编程门阵列(FPGA)提出了一种低功耗Tiny YOLOv3网络加速器。在卷积层IP设计中,采用了通道交错方法加速传统卷积计算,使用16位定点数优化数据位宽,同时利用层分组方法来降低数据传输延迟,通过输入输出通道折叠的方法来降低硬件资源的消耗。在系统实现阶段,通过在Vivado SDK中设置不同拓扑参数对Tiny YOLOv3网络进参数配置。实验结果表明,当工作频率为100 MHz时,与Intel CPU以及ARM CPU相比,分别加速了17倍和289倍。与基于GPU及其他FPGA的YOLO实现相比,该系统可以显著降低硬件资源消耗以及功耗。
To reduce the power consumption and hardware resource consumption of deploying YOLO network at the edge computing end,this paper proposes a low-power Tiny YOLOv3 network accelerator based on Field Programmable Gate Array(FPGA). In the convolution layer IP design,it uses channel interleaving method to accelerate traditional convolution calculation,and it uses 16 bit fixed-point number to optimize data width. At the same time,layer grouping technology is used to reduce the delay of data transmission. In order to reduce the consumption of hardware resources,the system adopts channel folding method. In the system implementation,different topology parameters are set in Vivado SDK to configure Tiny YOLOv3 network. Experimental results show that the system is 17 times faster than Intel CPU and 289 times faster than ARM CPU under the working frequency of 100 MHz. Compared with the YOLO implementation based on GPU and other FPGA designs, it can significantly reduce the consumption of hardware resources and power.
作者
李钦祚
肖灯军
LI Qinzuo;XIAO Dengjun(Aerospace Information Research Institute(AIR),Chinese Academy of Sciences(CAS),Beijing 100190,China;School of Electronic,Electrical and Communication Engineering,University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《电子设计工程》
2022年第20期6-12,共7页
Electronic Design Engineering
基金
国家自然科学基金青年基金(61901442)。
关键词
YOLO算法
现场可编程门阵列
低功耗
并行加速器
可动态配置
卷积神经网络
You Only Look Once(YOLO)algorithm
Field Programmable Gate Array(FPGA)
low power
parallel accelerator
dynamic configuration
Convolutional Neural Network(CNN)