摘要
卷积神经网络(Convolutional Neural Networks,CNN)已被广泛应用于图像处理领域.基于CNN的目标检测模型,如YOLO,已被证明在许多应用中是最先进的.CNN对计算能力和内存带宽要求极高,通常需要部署到专用硬件平台,FPGA因其高性能、低功耗和可重配置性成为CNN的有效硬件加速器.以往的基于FPGA的目标检测加速器主要采用传统卷积算法,然而,传统卷积算法的高运算复杂度限制了加速器的性能.基于此,本文设计了一种基于Winograd算法的目标检测加速器.考虑到各模块间的联系,采用模块融合策略融合卷积层和池化层模块,降低数据移动次数,减少片外存储器访问次数,提高加速器整体性能.以YOLO2模型为例,对数据访问模式、池化内核、参数重排序、数据通路优化进行分析设计,并部署在U280板卡上.实验结果表明,量化后mAP降低了0.96%,性能达249.65 GOP/s,是Xilinx官网所给数据的4.4倍.
Convolutional neural network(CNN)has been widely used in the field of image processing.CNN-based target detection models,such as YOLO,have proven to be the most advanced in many applications.CNN has extremely high requirements for computing power and memory bandwidth,and usually needs to be deployed on a dedicated hardware platform.FPGA has become an effective hardware accelerator for CNN due to its high performance,low power consump⁃tion and reconfigurability.In the past,FPGA-based target detection accelerators mainly used traditional convolution algo⁃rithms.However,the high computational complexity of traditional convolution algorithms limited the accelerator’s perfor⁃mance.Based on this,this paper designs a target detection accelerator based on Winograd algorithm.Taking into account the connection between the modules,the module fusion strategy is adopted to fuse the convolutional layer and the pooling layer modules to reduce the number of data movement,reduce the number of off-chip memory accesses,and improve the overall performance of the accelerator.Take the YOLO2 model as an example,analyze and design the data access mode,pooled kernel,parameter reordering,and data path optimization,and deploy it on the U280 board.The experimental results show that mAP is reduced by 0.96%after quantification,and the performance reaches 249.65GOP/s,which is 4.4 times the data given by Xilinx official website.
作者
李斌
齐延荣
周清雷
LI Bin;QI Yan-rong;ZHOU Qing-lei(School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou,Henan 450001,China;School of Information Engineering,Zhengzhou University,Zhengzhou,Henan 450001,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2022年第10期2387-2397,共11页
Acta Electronica Sinica