Design of high parallel CNN accelerator based on FPGA for AIoT

导出

摘要 To tackle the challenge of applying convolutional neural network(CNN)in field-programmable gate array(FPGA)due to its computational complexity,a high-performance CNN hardware accelerator based on Verilog hardware description language was designed,which utilizes a pipeline architecture with three parallel dimensions including input channels,output channels,and convolution kernels.Firstly,two multiply-and-accumulate(MAC)operations were packed into one digital signal processing(DSP)block of FPGA to double the computation rate of the CNN accelerator.Secondly,strategies of feature map block partitioning and special memory arrangement were proposed to optimize the total amount of off-chip access memory and reduce the pressure on FPGA bandwidth.Finally,an efficient computational array combining multiplicative-additive tree and Winograd fast convolution algorithm was designed to balance hardware resource consumption and computational performance.The high parallel CNN accelerator was deployed in ZU3 EG of Alinx,using the YOLOv3-tiny algorithm as the test object.The average computing performance of the CNN accelerator is 127.5 giga operations per second(GOPS).The experimental results show that the hardware architecture effectively improves the computational power of CNN and provides better performance compared with other existing schemes in terms of power consumption and the efficiency of DSPs and block random access memory(BRAMs).

作者 Lin Zhijian Gao Xuewei Chen Xiaopei Zhu Zhipeng Du Xiaoyong Chen Pingping

机构地区 School of Advanced Manufacturing College of Physics and Information Engineering

出处《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2022年第5期1-9,61,共10页 中国邮电高校学报（英文版）

基金 supported by the National Natural Science Foundation of China(61871132,62171135)。

关键词 artificial intelligence of things(AIoT) convolutional neural network(CNN)accelerator Winograd convolution field-programmable gate array(FPGA)

分类号 TP183 [自动化与计算机技术—控制理论与控制工程] TN791 [电子电信—电路与系统]

引文网络
相关文献

参考文献2

1陈辰,柴志雷,夏珺.基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现[J].计算机科学与探索,2019,13(10):1677-1693. 被引量：20
2Cheng Luo,Man-Kit Sit,Hongxiang Fan,Shuanglong Liu,Wayne Luk,Ce Guo.Towards efficient deep neural network training by FPGA-based batch-level parallelism[J].Journal of Semiconductors,2020,41(2):51-62. 被引量：4

二级参考文献2

1卢冶,陈瑶,李涛,蔡瑞初,宫晓利.面向边缘计算的嵌入式FPGA卷积神经网络构建方法[J].计算机研究与发展,2018,55(3):551-562. 被引量：47
2吴艳霞,梁楷,刘颖,崔慧敏.深度学习FPGA加速器的进展与趋势[J].计算机学报,2019,42(11):2461-2480. 被引量：61

共引文献21

1王利翔,林珊玲,林志贤,郭太良.基于Zynq平台的图像目标检测系统[J].半导体光电,2023,44(1):147-152.
2李炳剑,秦国轩,朱少杰,裴智慧.面向卷积神经网络的FPGA加速器架构设计[J].计算机科学与探索,2020,14(3):437-448. 被引量：6
3吴杰,段锦,赫立群,李英超,朱文涛.DS-YOLO网络在遥感图像中的飞机检测算法研究[J].计算机工程与应用,2021,57(1):181-187. 被引量：3
4梁洪卫,白鹏程,陈建玲,孙勤江,陈明虎,薛祥凯.基于FPGA的YOLOv2加速器设计[J].吉林大学学报（信息科学版）,2021,39(4):445-450.
5杜煜章,潘家华,宗容,粟炜,王威廉.基于硬件加速的轻量级网络心音分类器[J].计算机工程与应用,2021,57(23):263-269. 被引量：2
6孟浩,刘强.基于FPGA的卷积神经网络训练加速器设计[J].南京大学学报（自然科学版）,2021,57(6):1075-1082. 被引量：3
7吴李煜,张紫龙,张华君,田野,常胜.基于嵌入式FPGA的航拍目标检测解决方案[J].现代电子技术,2022,45(2):1-6. 被引量：5
8张瑞琰,姜秀杰,安军社,崔天舒.PYNQ框架的高精度异构无预选框检测模型实现[J].哈尔滨工业大学学报,2022,54(5):24-33. 被引量：1
9曹远杰,高瑜翔,杜鑫昌,涂雅培,吴美霖.基于改进YOLOv4-Tiny的FPGA加速方法[J].无线电工程,2022,52(4):604-611.
10李慧琳,柴志雷.基于Vitis AI的可行驶区域检测定制计算系统设计[J].现代信息科技,2022,6(1):73-78. 被引量：1

1Jiaqi ZHAO,Hui ZHU,Fengwei WANG,Rongxing LU,Hui LI,Zhongmin ZHOU,Haitao WAN.ACCEL: an efflcient and privacy-preserving federated logistic regression scheme over vertically partitioned data[J].Science China(Information Sciences),2022,65(7):94-95. 被引量：2
2Lei Liu,Xiu Ma,Hua-xiao Liu,Guang-li Li,Lei Liu.FlexPDA:A Flexible Programming Framework for Deep Learning Accelerators[J].Journal of Computer Science & Technology,2022,37(5):1200-1220.
3Zhuo-hao LIU,Chang-yu DIAO,Wei XING,Dong-ming LU.A low-overhead asynchronous consensus framework for distributed bundle adjustment[J].Frontiers of Information Technology & Electronic Engineering,2020,21(10):1442-1454.
4向文昌,蔡燕兵,周代翠.色玻璃凝聚中Sudakov效应对矢量介子产生的影响[J].武汉大学学报（理学版）,2022,68(3):236-244.
5姚培娟,张亚娟,付辉,裴航冉.基于大数据的计算机科学与技术课程线上教学系统设计[J].信息与电脑,2022,34(14):130-132. 被引量：3
6Manoj Kumar,Suman.Hybrid Cuckoo Search Algorithm for Scheduling in Cloud Computing[J].Computers, Materials & Continua,2022(4):1641-1660.
7Yunxiang Zhao,Jinyong Cheng,Ping Zhang,Xueping Peng.ECG Classification Using Deep CNN Improved by Wavelet Transform[J].Computers, Materials & Continua,2020(9):1615-1628. 被引量：2
8Xuan Wang,Minghong Zhong,Hoiyuen Cheng,Junjie Xie,Yingchu Zhou,Jun Ren,Mengyuan Liu.SpikeGoogle:Spiking Neural Networks with GoogLeNet-like inception module[J].CAAI Transactions on Intelligence Technology,2022,7(3):492-502. 被引量：1
9Kosta Oubrerie,Adrien Leblanc,Olena Kononenko,Ronan Lahaye,Igor A.Andriyash,Julien Gautier,Jean-Philippe Goddet,Lorenzo Martelli,Amar TafziKim Ta Phuoc,Slava Smartsev,Cédric Thaury.Controlled acceleration of GeV electron beams in an all-optical plasma waveguide[J].Light(Science & Applications),2022,11(7):1535-1541. 被引量：3
10Yimeng Feng,Yi Jiang,Mahesh K.Varanasi.A Universal Hybrid Precoding Scheme for Massive MIMO Communications[J].China Communications,2022,19(11):160-178. 被引量：1

The Journal of China Universities of Posts and Telecommunications

2022年第5期

浏览历史

内容加载中请稍等...

Design of high parallel CNN accelerator based on FPGA for AIoT

参考文献2

二级参考文献2

共引文献21

相关作者

相关机构

相关主题

浏览历史