摘要
针对图像识别算法中卷积神经网络运算量大、耗时长、对资源需求高的问题,提出了一种基于现场可编程门阵列的卷积神经网络硬件加速器设计方案.在现有的网络基础上将批量规范化作为训练模型结构的一部分加入卷积层,可有效解决梯度爆炸,加快网络收敛;设计动态定点量化方式,对卷积运算过程中的浮点数定点量化后进行卷积计算,研究不同硬件平台下的加速效果;采用XC7Z020开发板结合现场可编程门阵列高级综合工具设计并行流水线计算方法的硬件结构.结果表明,该方案有效地节省了查找表和寄存器资源的使用,相比于CPU计算速度提升约10倍.
Aiming at the problems of large computational complexity,time-consuming,and high resource requirements of convolutional neural network(CNN)in image recognition algorithms,this paper proposes a design scheme of convolutional neural network hardware accelerator based on field programmable gate array(FPGA).On the basis of the existing network,batch normalization(BN)is added to the convolutional layer as part of the training model structure,which can effectively solve the gradient explosion and speed up the network convergence.A dynamic fixed-point quantization method is designed and the convolution calculations after fixed-point quantization of floating-point numbers in the process of convolution operation is performed,while the acceleration effects under different hardware platforms is studied.The hardware structure of parallel pipeline computing method is designed by XC7Z020 development board and high level synthesis(HLS)of FPGA.The results show that this scheme effectively saves the use of look-up table(LUT)and register(BRAM)resources,and the computing speed is about 10 times higher than that of CPU.
作者
马晓光
蒋占军
MA Xiao-guang;JIANG Zhan-jun(School of Electronics and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
出处
《兰州交通大学学报》
CAS
2021年第5期51-57,共7页
Journal of Lanzhou Jiaotong University
基金
甘肃省无线电监测定位创新团队(2017C-09)
兰州交通大学“百名青年优秀人才培养计划”基金(150220232)。
关键词
卷积神经网络
动态定点量化
现场可编程门阵列
硬件加速器
convolutional neural network
dynamic fixed-point quantization
field programmable gate array
hardware accelerator