摘要
为提高卷积神经网络的处理速度,使用零梯度近似处理的卷积方法(梯度卷积)来提高数据的复用率,减少计算量。以卷积核为单位对数据进行梯度计算,针对不同网络的不同层次采用灵活的梯度阈值计算策略,以合理复用相邻窗口的卷积结果。将其中关键的梯度处理模块和卷积计算部分在现场可编程门阵列(Field-Programmable Gate Array,FPGA)上进行实现,与脉动阵列相结合以提高资源利用率,并针对负载不均衡的问题设计出适合梯度卷积的数据流。基于YOLOv3模型和Pascal VOC数据集的目标检测实验中,在付出较小精度损失的前提下,软件端减少约23.2%的计算量,结合硬件加速比约为17.8%。
In order to improve the processing speed of the convolutional neural network,we use the convolution method of zero-grad approximate treatment(grad convolution)to reduce the computation amount and improve the reuse rate of the data.The grad calculation of the data is performed in terms of the convolution kernel,and a flexible gradient threshold calculation strategy for different levels of different networks is adopted to rationally reuse the convolution results of adjacent windows.The key grad processing module and convolution calculation part are implemented on Field-Programmable Gate Array(FPGA),combined with pulsation array to improve resource utilization,and the data flow suitable for gradient convolution is designed for the problem of load imbalance.In the target detection experiment based on YOLOv3 model and Pascal VOC dataset,the software side reduced the computation by about 23.2%,and the combined hardware acceleration ratio was about 17.8%.
作者
蔡元鹏
孙文浩
陈松
CAI Yuanpeng;SUN Wenhao;CHEN Song(School of Microelectronics,University of Science and Technology of China,Hefei 230026,China)
出处
《微电子学与计算机》
2024年第4期104-111,共8页
Microelectronics & Computer
基金
国家重点研发计划(2019YFB2204800)
国家自然科学基金(61931008)。
关键词
加速器
数据局部相似性
卷积神经网络
梯度卷积
现场可编程门阵列
accelerator
local similarity of data
convolutional neural network
grad convolution
field-programmable gate array(FPGA)