摘要
卷积神经网络是一种前馈神经网络,它的人工神经元可以响应部分覆盖范围内的临近单元,对于大型图像处理有出色表现。文中设计了一种基于Zynq芯片的CNN加速器,以期在资源和功耗受限的FPGA中实现运算性能加速。该加速器采用数据量化的方式将网络参数从64位双精度浮点数转化为16位定点数;针对CNN不同层的特性和要求,设计了不同的网络结构和优化策略。卷积层和全连接层采用循环分块、循环流水及循环展开等方法进一步改进,而池化层采用流水线的优化方式。亦设计了FPGA和外部存储器的缓存策略,减少FPGA和外部存储器的数据传输量。以CIFAR-10数据集下的图像识别为例,在Zynq7020实验平台上进行板级测试,实验结果表明,100 MHz的工作频率下,平均识别时间为15.5 ms,相对于单核CPU方案实现了144倍的加速。
Convolutional neural network is a feed-forward neural network whose artificial neurons can respond to neighboring units within partial coverage and perform well in large-scale image processing.A CNN accelerator based on the Zynq chip is designed to accelerate the computing performance in the FPGA with limited resources and power consumption.The accelerator uses data quantization to quantify network parameters from 64-bit double-precision floating-point numbers to 16-bit fixed-point numbers.According to the characteristics and requirements of different layers of CNN,different network structures and optimization strategies are designed.The convolutional layer and the fully connected layer are further improved by the methods of loop tiling,loop pipeline and loop unrolling,and the pooling layer uses the pipeline optimization method.A cache strategy for FPGA and external memory is designed to reduce the amount of data transfer between FPGA and external memory.Taking image recognition under the CIFAR-10 data set as an example,a board-level test was performed on the Zynq7020 experimental platform.The experiment shows that the average recognition time is 15.5 ms at a working frequency of 100 MHz,which is 144 times faster than the single-core CPU solution.
作者
许杰
张子恒
王新宇
佟诚
梅青
肖建
XU Jie;ZHANG Zi-heng;WANG Xin-yu;TONG Cheng;MEI Qing;XIAO Jian(School of Electronic and Optical Engineering,School of Microelectronics,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《计算机技术与发展》
2021年第11期108-113,121,共7页
Computer Technology and Development
基金
国家自然科学基金面上项目(61974073)。