期刊文献+

一种基于FPGA的卷积神经网络加速器设计与实现 被引量:11

Design and Implementation of a Convolutional Neural Network Accelerator Based on FPGA
下载PDF
导出
摘要 针对卷积神经网络模型ZynqNet现有FPGA实现版本中卷积运算单元并行度低,存储结构过度依赖片外存储等问题,提出一种针对ZynqNet的FPGA优化设计.设计了双缓冲结构将中间运算结果放到片内以减少片外存储访问;将数据位宽从32位降为16位;设计了具有64个卷积运算单元的并行结构.实验结果表明,在ImageNet测试准确度相同的情况下,本文所提出的设计工作频率可达200 MHz,运算速率峰值达到1.85GMAC/s,是原ZynqNet实现的10倍,相比i5-5200UCPU可实现20倍加速.同时,其计算能效达到了NVIDIA GTX 970GPU的5.4倍. In the hardware design of ZynqNet implemented on FPGA,the parallelism of convolution unit is low and the storage structure is almost dependent on off-chip memory.A FPGA accelerator optimization is proposed based on ZynqNet and it is easy to apply in other CNN models.The double buffering stores intermediate result of the network into the chip to reduce off-chip access;The data precision is changed from 32 bits to 16 bits,thus a parallel structure of64 convolution operation units is designed to improve computing parallelism.The ImageNet results show that the optimized accelerator based on FPGA can achieve peak performance of 1.85 GMAC/s under 200 MHz,it is 10 times speedup compared to the original ZynqNet and 20 times speedup compared to i5-5200 UCPU.In terms of performance power ratio,the FPGA accelerator is 5.4 times of NVIDIA GTX 970 GPU version.
作者 仇越 马文涛 柴志雷 QIU Yue;MA Wen-tao;CHAI Zhi-lei(School of Internet of Things,Jiangnan University,Wuxi 214122,China;State Key Laboratory of Mathematical Engineering and Advanced Computing,Wuxi 214125,)
出处 《微电子学与计算机》 CSCD 北大核心 2018年第8期68-72,77,共6页 Microelectronics & Computer
基金 数学工程与先进计算国家重点实验室开放基金(2015A07)
关键词 卷积神经网络 现场可编程门阵列(FPGA) ZynqNet 并行计算 加速 convolutional Neutral Network (CNN) field-programmable gate Array(FPGA) ZynqNet parallelismcomputing acceleration
  • 相关文献

参考文献1

二级参考文献18

  • 1LeCun Y, Bottou I., Bengio Y, et al. Gradient-based learning applied to document recognition [J]. the IEEE, 1998,86(11) ~2278-2324.
  • 2Abdel-Hamid O, MohamedA R, Hui J, et al. Convolu- tional neural networks for speech recognition [J]. IEEE-ACM Transactions on Audio Speech and Lan- guage Processing, 2014,22 (10) : 1533-1545.
  • 3Cheung B. Convolutional neural networks applied to human face classification[C]//llth International Con- ference on Machine Learning and Applications. Boca Raton: IEEE, 2012(2) : 580-583.
  • 4Wu Yihui. Traffic sign detection based on convolutional neural networks[C]//The 2013 International Joint Confer- ence on Neural Networks. Dallas:IEEE,2013:1-7.
  • 5Jerry CLL, Moshe E. Convolutional neural networks for eye detection in remote gaze estimation systems[C] //International Multiconference of Engineers and Com- puter Scientists. Hong Kong; IEEE, 2008 ~ 601-606.
  • 6Ji Shuiwang, Xu Wei, Yang Ming, et al. 3D convolu- tional neural networks for human action recognition [J]. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 2013,35(1) : 221-231.
  • 7Julien Mairal, Piotr Koniusz, Zaid Harchaoui, et al. Convolutional kernel networks [C] // Neural Informa- tion Processing Systems. Montreal, Canada, 2014.
  • 8LeCun Y. Generalization and network design strategies [R]. Pfeifer: Connectionist Research Group, 1989.
  • 9LeCun Y,Bottou L, Orr G B, et al. Efficient baekProp [M]. Berlin, Heidelberg: Spring-Verlag, 1998 : 9-50.
  • 10Tivive F H C, Bouzerdoum A. Efficient training algo- rithms for a class of shunting inhibitory convolutional neural networks[J]. IEEE Transactions on Neural Net- work, 2005,16(3) : 541-556.

共引文献5

同被引文献38

引证文献11

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部