期刊文献+

基于异构平台的卷积神经网络加速系统设计 被引量:2

Design of convolutional neural network acceleration system based on heterogeneous platform
下载PDF
导出
摘要 在计算和存储资源受限的嵌入式设备上部署卷积神经网络,存在执行速度慢、计算效率低、功耗高的问题。提出了一种基于异构平台的新型卷积神经网络加速架构,设计并实现了基于MobileNet的轻量化卷积神经网络加速系统。首先,为降低硬件资源消耗以及数据传输成本,采用动态定点数量化和批标准化融合的设计方法,对网络模型进行了优化,并降低了加速系统的硬件设计复杂度;其次,通过实现卷积分块、并行卷积计算、数据流优化,有效提高了卷积运算效率和系统吞吐率。在PYNQ-Z2平台上的实验结果表明,此加速系统实现的MobileNet网络推理加速方案对单幅图像的识别时间为0.18 s,系统功耗为2.62 W,相较于ARM单核处理器加速效果提升了128倍。 Deploying convolutional neural networks(CNN)on embedded devices with limited computing and storage resources poses challenges such as slow execution speed,low computational efficiency,and high power consumption.This paper proposes a novel CNN acceleration architecture based on a heterogeneous platform,and designs and implements a lightweight CNN acceleration system based on MobileNet.Firstly,to reduce hardware resource consumption and data transmission costs,a design method combining dynamic fixed-point quantization and batch normalization fusion is employed to optimize the network model and reduce the hardware design complexity of the acceleration system.Secondly,by implementing convolutional block partitioning,parallel convolutional computation,and data flow optimization,the efficiency of convolutional operations and system throughput are effectively improved.Experimental results on the PYNQ-Z2 platform demonstrate that the MobileNet network inference acceleration scheme implemented by this acceleration system achieves a recognition time of 0.18 seconds per image and a system power consumption of 2.62 watts,representing a 128-fold improvement in acce-leration performance compared to an ARM single-core processor.
作者 秦文强 吴仲城 张俊 李芳 QIN Wen-qiang;WU Zhong-cheng;ZHANG Jun;LI Fang(Institute of Physical Science and Information Technology,Anhui University,Hefei 230601;Center for High Magnetic Field Science,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230031;High Magnetic Field Laboratory of Anhui Province,Hefei 230031,China)
出处 《计算机工程与科学》 CSCD 北大核心 2024年第1期12-20,共9页 Computer Engineering & Science
基金 中国科学院合肥大科学中心重点研发项目(2019HSC-KPRD003) 合肥综合性国家科学中心项目(QGCYY04)。
关键词 现场可编程门阵列(FPGA) Vivado高层次综合 卷积神经网络 异构平台 硬件加速 field programmable gate array(FPGA) Vivado high level synthesis convolutional neural network heterogeneous platform hardware acceleration
  • 相关文献

参考文献3

二级参考文献12

  • 1Zhang Ting. Research on key technology of accelerating float- ing-point matrix multiplication based on FPGA in embedded environment[D]. Changsha: Hunan University, 2013: 361- 367. (in Chinese).
  • 2Jang J-W,Choi S, Prasanna V K. Area and time efficient im plementation of matrix multiplication on FPGAs[C]//Proc of the International Conference on Field-Programmable Tech- nology(FPT~ 02), 2002 : 93 -100.
  • 3Zhuo L, Prasanna V. Scalable and modular algorithms for floating point matrix multiplication on FPGAs[C]//Proe of the 18th International Parallel and Distributed Processing Symposium,2004:92. doi: 10. ll09/IPDPS. 2004. 1303036.
  • 4Jang J-W,Choi S, Prasanna V K. Energy- and time-efficient matrix multiplication on FPGAs[C]//Proc of the Interna tional Conference on VLSI Design ( VLSI' 2005), 2005 : 1305 -1319.
  • 5Dou Y, Vassiliadis S, Kuzmanov G K. 64-bit floating-point FPGA matrix multiplieation[C]//Proc of the International Symposium on Field-Programmable Gate Arrays (FPGA' 05) ,2005:86- 95.
  • 6Zhuo I,, Prasanna V K. Scalable and modular algorithms for floating-point matrix multiplication on reeonfigurable compu- ting systems [J].IEEE Transactions on Parallel and Distrib- uted Systems, 2007,18(4) : 433-448.
  • 7Kumar V, Joshi S, Patkar S, et al. FPGA based high per formanee Double precision matrixe multiplication[C]//Proc of the International Conference on VLSI Design (VLSI' 2009) :341-346.
  • 8Jovanovic Z, Milutinovic V. FPGA accelerator for floating- point matrix multiplication[J]. IET Computers g>- Digital Techniques, 2012,6 (4) : 249-256.
  • 9Krizhevsky A,Sutskever I, Hinton G E. Imagenet classifica- tion with deep convolutional neural networks[J]. Advanced in Neural Information Processing Systems, 2012, 25 (2): 1097- 1105.
  • 10夏珺,钱磊,严伟,柴志雷.基于FPGA的HEVC后处理CNN硬件加速器研究[J].计算机工程与科学,2018,40(12):2126-2132. 被引量:2

共引文献5

同被引文献12

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部