摘要
在计算和存储资源受限的嵌入式设备上部署卷积神经网络,存在执行速度慢、计算效率低、功耗高的问题。提出了一种基于异构平台的新型卷积神经网络加速架构,设计并实现了基于MobileNet的轻量化卷积神经网络加速系统。首先,为降低硬件资源消耗以及数据传输成本,采用动态定点数量化和批标准化融合的设计方法,对网络模型进行了优化,并降低了加速系统的硬件设计复杂度;其次,通过实现卷积分块、并行卷积计算、数据流优化,有效提高了卷积运算效率和系统吞吐率。在PYNQ-Z2平台上的实验结果表明,此加速系统实现的MobileNet网络推理加速方案对单幅图像的识别时间为0.18 s,系统功耗为2.62 W,相较于ARM单核处理器加速效果提升了128倍。
Deploying convolutional neural networks(CNN)on embedded devices with limited computing and storage resources poses challenges such as slow execution speed,low computational efficiency,and high power consumption.This paper proposes a novel CNN acceleration architecture based on a heterogeneous platform,and designs and implements a lightweight CNN acceleration system based on MobileNet.Firstly,to reduce hardware resource consumption and data transmission costs,a design method combining dynamic fixed-point quantization and batch normalization fusion is employed to optimize the network model and reduce the hardware design complexity of the acceleration system.Secondly,by implementing convolutional block partitioning,parallel convolutional computation,and data flow optimization,the efficiency of convolutional operations and system throughput are effectively improved.Experimental results on the PYNQ-Z2 platform demonstrate that the MobileNet network inference acceleration scheme implemented by this acceleration system achieves a recognition time of 0.18 seconds per image and a system power consumption of 2.62 watts,representing a 128-fold improvement in acce-leration performance compared to an ARM single-core processor.
作者
秦文强
吴仲城
张俊
李芳
QIN Wen-qiang;WU Zhong-cheng;ZHANG Jun;LI Fang(Institute of Physical Science and Information Technology,Anhui University,Hefei 230601;Center for High Magnetic Field Science,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230031;High Magnetic Field Laboratory of Anhui Province,Hefei 230031,China)
出处
《计算机工程与科学》
CSCD
北大核心
2024年第1期12-20,共9页
Computer Engineering & Science
基金
中国科学院合肥大科学中心重点研发项目(2019HSC-KPRD003)
合肥综合性国家科学中心项目(QGCYY04)。