基于异构平台的卷积神经网络加速系统设计被引量：2

Design of convolutional neural network acceleration system based on heterogeneous platform

下载PDF

导出

摘要在计算和存储资源受限的嵌入式设备上部署卷积神经网络,存在执行速度慢、计算效率低、功耗高的问题。提出了一种基于异构平台的新型卷积神经网络加速架构,设计并实现了基于MobileNet的轻量化卷积神经网络加速系统。首先,为降低硬件资源消耗以及数据传输成本,采用动态定点数量化和批标准化融合的设计方法,对网络模型进行了优化,并降低了加速系统的硬件设计复杂度;其次,通过实现卷积分块、并行卷积计算、数据流优化,有效提高了卷积运算效率和系统吞吐率。在PYNQ-Z2平台上的实验结果表明,此加速系统实现的MobileNet网络推理加速方案对单幅图像的识别时间为0.18 s,系统功耗为2.62 W,相较于ARM单核处理器加速效果提升了128倍。 Deploying convolutional neural networks(CNN)on embedded devices with limited computing and storage resources poses challenges such as slow execution speed,low computational efficiency,and high power consumption.This paper proposes a novel CNN acceleration architecture based on a heterogeneous platform,and designs and implements a lightweight CNN acceleration system based on MobileNet.Firstly,to reduce hardware resource consumption and data transmission costs,a design method combining dynamic fixed-point quantization and batch normalization fusion is employed to optimize the network model and reduce the hardware design complexity of the acceleration system.Secondly,by implementing convolutional block partitioning,parallel convolutional computation,and data flow optimization,the efficiency of convolutional operations and system throughput are effectively improved.Experimental results on the PYNQ-Z2 platform demonstrate that the MobileNet network inference acceleration scheme implemented by this acceleration system achieves a recognition time of 0.18 seconds per image and a system power consumption of 2.62 watts,representing a 128-fold improvement in acce-leration performance compared to an ARM single-core processor.

作者秦文强吴仲城张俊李芳 QIN Wen-qiang;WU Zhong-cheng;ZHANG Jun;LI Fang(Institute of Physical Science and Information Technology,Anhui University,Hefei 230601;Center for High Magnetic Field Science,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230031;High Magnetic Field Laboratory of Anhui Province,Hefei 230031,China)

机构地区安徽大学物质科学与信息技术研究院中国科学院合肥物质科学研究院强磁场科学中心强磁场安徽省实验室

出处《计算机工程与科学》 CSCD 北大核心 2024年第1期12-20,共9页 Computer Engineering & Science

基金中国科学院合肥大科学中心重点研发项目(2019HSC-KPRD003) 合肥综合性国家科学中心项目(QGCYY04)。

关键词现场可编程门阵列(FPGA) Vivado高层次综合卷积神经网络异构平台硬件加速 field programmable gate array(FPGA) Vivado high level synthesis convolutional neural network heterogeneous platform hardware acceleration

分类号 TP368 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1卢敏.基于FPGA的加速器设计方法研究[J].现代计算机,2021,27(31):61-65. 被引量：1
2沈俊忠,肖涛,乔寓然,杨乾明,文梅.一种支持优化分块策略的矩阵乘加速器设计[J].计算机工程与科学,2016,38(9):1748-1754. 被引量：4
3韩哲,姜晶菲,乔林波,窦勇,许金伟,阚志刚.基于FPGA的事件抽取模型与加速器的设计实现[J].计算机工程与科学,2020,42(11):1941-1948. 被引量：3

二级参考文献12

1Zhang Ting. Research on key technology of accelerating float- ing-point matrix multiplication based on FPGA in embedded environment[D]. Changsha: Hunan University, 2013: 361- 367. (in Chinese).
2Jang J-W,Choi S, Prasanna V K. Area and time efficient im plementation of matrix multiplication on FPGAs[C]//Proc of the International Conference on Field-Programmable Tech- nology(FPT~ 02), 2002 : 93 -100.
3Zhuo L, Prasanna V. Scalable and modular algorithms for floating point matrix multiplication on FPGAs[C]//Proe of the 18th International Parallel and Distributed Processing Symposium,2004:92. doi: 10. ll09/IPDPS. 2004. 1303036.
4Jang J-W,Choi S, Prasanna V K. Energy- and time-efficient matrix multiplication on FPGAs[C]//Proc of the Interna tional Conference on VLSI Design ( VLSI' 2005), 2005 : 1305 -1319.
5Dou Y, Vassiliadis S, Kuzmanov G K. 64-bit floating-point FPGA matrix multiplieation[C]//Proc of the International Symposium on Field-Programmable Gate Arrays (FPGA' 05) ,2005:86- 95.
6Zhuo I,, Prasanna V K. Scalable and modular algorithms for floating-point matrix multiplication on reeonfigurable compu- ting systems [J].IEEE Transactions on Parallel and Distrib- uted Systems, 2007,18(4) : 433-448.
7Kumar V, Joshi S, Patkar S, et al. FPGA based high per formanee Double precision matrixe multiplication[C]//Proc of the International Conference on VLSI Design (VLSI' 2009) :341-346.
8Jovanovic Z, Milutinovic V. FPGA accelerator for floating- point matrix multiplication[J]. IET Computers g>- Digital Techniques, 2012,6 (4) : 249-256.
9Krizhevsky A,Sutskever I, Hinton G E. Imagenet classifica- tion with deep convolutional neural networks[J]. Advanced in Neural Information Processing Systems, 2012, 25 (2): 1097- 1105.
10夏珺,钱磊,严伟,柴志雷.基于FPGA的HEVC后处理CNN硬件加速器研究[J].计算机工程与科学,2018,40(12):2126-2132. 被引量：2

共引文献5

1张晓楠,高献伟,董秀则.基于FPGA的进位存储大数乘法器的改进与实现[J].计算机工程与应用,2017,53(21):58-61. 被引量：1
2刘余福,郎文辉,贾光帅.HXDSP平台上矩阵乘法的实现与性能分析[J].计算机工程,2019,45(4):25-29. 被引量：4
3宋宇鲲,郑强强,王泽中,张多利.一种极低IO带宽需求的大维度矩阵链式矩阵乘法器设计[J].电子技术应用,2019,45(9):32-38.
4杨春霞,宋金剑,姚思诚.基于深度BiLSTM和图卷积网络的方面级情感分析[J].计算机工程与科学,2022,44(10):1893-1900. 被引量：2
5胡庆孟,王红斌,王俊钟.基于NPN融入词性注意力机制的中文事件探测[J].计算机工程与科学,2023,45(8):1490-1497.

同被引文献12

1薛荣荣,王绪亭,魏智顶,史维锦,李伟,刘哲.智能化技术在港口散货作业人员不安全行为管控预警中的应用[J].港口科技,2023(10):38-42. 被引量：2
2王熙.世界一流绿色港口指标体系[J].港口科技,2023(10):10-12. 被引量：1
3Chuang-Yi Gui,Long Zheng,Bingsheng He,Cheng Liu,Xin-Yu Chen,Xiao-Fei Liao,Hai Jin.A Survey on Graph Processing Accelerators:Challenges and Opportunities[J].Journal of Computer Science & Technology,2019,34(2):339-371. 被引量：14
4李涵,严明玉,吕征阳,李文明,叶笑春,范东睿,唐志敏.图神经网络加速结构综述[J].计算机研究与发展,2021,58(6):1204-1229. 被引量：7
5杨阳阳,崔永俊,侯钰龙.基于时差法的高精度超声波风速风向测量系统[J].仪表技术与传感器,2022(2):79-83. 被引量：9
6缪丹丹,张鹏,张鑫宇,崔敏.基于ZYNQ平台的通用卷积加速器设计[J].国外电子测量技术,2022,41(11):72-77. 被引量：4
7李亚轮,黄波,卫玮,陈贺军,舒强.基于模型开发的EHB系统应用层软件设计[J].现代电子技术,2023,46(4):119-124. 被引量：1
8蒋玉英,陈心雨,李广明,王飞,葛宏义.图神经网络及其在图像处理领域的研究进展[J].计算机工程与应用,2023,59(7):15-30. 被引量：7
9单泽彪,于渤力,徐再祥,刘小松.基于二次相关的超声波风速风向测量方法[J].仪器仪表学报,2023,44(4):322-329. 被引量：7
10吴相帅,孙福振,张文龙,张志伟,王绍卿.基于图注意力的异构图社交推荐网络[J].计算机应用研究,2023,40(10):3076-3081. 被引量：2

引证文献2

1杨念.港口风速风向预警系统[J].港口科技,2024(9):24-34.
2谭会生,严舒琪,杨威.时空图卷积网络的骨架识别硬件加速器设计[J].电子测量技术,2024,47(11):36-43.

1陈天宇,楚程钱,万思远,万永菁,孙静.基于条件轻量级神经网络的视频入侵检测算法[J].计算机工程,2023,49(12):152-160.
2时睿,左芸帆,闫浩.基于异构平台的三角矩阵回代加速求解研究[J].集成电路与嵌入式系统,2024,24(1):13-18.
3景华.论船舶灭火的非确定型决策问题研究[J].消防界（电子版）,2023,9(10):16-18.

计算机工程与科学

2024年第1期

浏览历史

内容加载中请稍等...

基于异构平台的卷积神经网络加速系统设计被引量：2

参考文献3

二级参考文献12

共引文献5

同被引文献12

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于异构平台的卷积神经网络加速系统设计 被引量：2

参考文献3

二级参考文献12

共引文献5

同被引文献12

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于异构平台的卷积神经网络加速系统设计被引量：2