摘要
目标检测网络算法具有更高的检测精度,但庞大的计算复杂度使得传统硬件难以满足实时计算需求。为此,一种面向低延时目标检测的FPGA神经网络加速器被设计研究。该加速器能够支持高并行卷积稀疏计算,进而优化计算延时;同时设计了集中式存储阵列结构,能够实现存储阵列和计算阵列非一一对应的数据交互。基于Xilinx VCU118开发板和YOLOv3深度神经网络的测试结果显示,加速器单帧延时只有24.36 ms,并具有2704 GOPS的吞吐性能和更高的面积效率。
The object detection network algorithms have higher detection accuracy.However,the huge computational complexity makes it difficult for traditional processors to realize real-time processing.Therefore,a neural network accelerator based on FPGA is proposed for low-latency object detection.It can support high-parallel convolutional sparse calculating,which improves the parallelism and reduces the calculation delay.Also,a centralized storage array structure is designed to achieve non-one-to-one data interaction between storage array and comput⁃ing array.Finally,the YOLOv3 network is implemented on the Xilinx VCU118 development board.The accelerator delay is only 24.36 ms,achieving 2704 GOPS throughput and higher area efficiency.
作者
郑思杰
李杰
贺光辉
ZHENG Sijie;LI Jie;HE Guanghui(School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240;Shanghai Academy of Spaceflight Technology(SAST),Shanghai 201109)
出处
《现代计算机》
2021年第18期38-43,共6页
Modern Computer
基金
国家重点研发计划项目(No.2019YFB2204500)
上海航天先进技术联合研究基金项目(No.USCAST2019-28)。
关键词
FPGA加速器
目标检测
卷积神经网络
低延时
稀疏计算
FPGA Accelerator
Object Detection
Convolutional Neural Network
Low-Latency
Sparse Calculating