基于FPGA的多核可扩展卷积加速器设计被引量：1

Design of CNN accelerator with multi-core based on FPGA

下载PDF

导出

摘要为解决卷积神经网络计算效率和能效较低的问题,提出并设计一种使用定点数据作为输入的卷积加速器。加速器支持动态量化的8 bits定点数据的卷积计算,通过采用分块计算的策略和改进的循环计算顺序,有效提高计算效率;支持激活、批标准化(BN)、池化和全连接等计算;基于软硬件协同设计的思路,设计包含卷积加速器和ARM处理器在内的SoC系统。提出一种将加速器进行多核扩展的方法,提高算力和移植便捷性。将加速器部署在Xilinx ZCU102开发板上,其中单核加速器的算力达到了153.6 GOP/s,在计算核数目增加到4个和8个的情况下,算力分别增至614.4 GOP/s和1024 GOP/s。 To solve the problem of low computation and energy efficiency of convolutional neural networks,a CNN hardware accelerator based on FPGA was proposed.The computation of dynamically quantified 8-bits fixed-point data was supported.The computation efficiency was effectively improved by adopting a tiling strategy and optimized circular calculation order.Calculations such as activation,batch normalization(BN),pooling and full connection were supported.Based on the idea of the co-design of hardware and software,a SoC system including accelerator and ARM processor was proposed.A strategy for multi-core expansion of the accelerator was also proposed to further increase the computing performance and improve the convenience of deploying the accelerator on different FPGA platforms.The accelerator was deployed on the Xilinx ZCU102.The computing performance of one-core accelerator can reach 153.6 GOP/s.As the number of accelerator core expands to four and eight,the computing performance is increased to 614.4 GOP/s and 1024 GOP/s,respectively.

作者张坤宁赵烁孙庆斌邓宁何虎 ZHANG Kun-ning;ZHAO Shuo;SUN Qing-bin;DENG Ning;HE Hu(Institute of Microelectronics,Tsinghua University,Beijing 100084,China)

机构地区清华大学微电子学研究所

出处《计算机工程与设计》北大核心 2021年第6期1592-1598,共7页 Computer Engineering and Design

基金国家自然科学基金项目(91846303)。

关键词卷积加速数据复用并行计算多核扩展软硬件协作 convolution acceleration data reuse parallel computation multi-core expansion hardware and software co-design

分类号 TP332 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

同被引文献3

1陈辰,严伟,夏珺,柴志雷.基于FPGA的深度学习目标检测系统的设计与实现[J].电子技术应用,2019,45(8):40-43. 被引量：10
2赵烁,范军,何虎.基于FPGA的CNN加速SoC系统设计[J].计算机工程与设计,2020,41(4):939-944. 被引量：9
3Chao Duan,Steffen Junginger,Jiahao Huang,Kairong Jin,Kerstin Thurow.Deep Learning for Visual SLAM in Transportation Robotics:A review[J].Transportation Safety and Environment,2019,1(3):177-184. 被引量：4

引证文献1

1郁媛,李沛君,王光奇,张德兵,张春.用于VSLAM系统的CNN在FPGA平台上的加速[J].计算机工程与设计,2024,45(1):71-78. 被引量：1

二级引证文献1

1权良华,王艺霖,黎思越,李世平,陈铠,邓松峰,何国强,冯书谊,傅玉祥,李丽.基于RISC-V和可重构智能加速核的异构SoC系统设计[J].电子与封装,2024,24(9):59-65.

1郭玉庆.改变运算顺序,提高运算速度——对圆的相关公式的再认识[J].中小学数学（小学版）,2020(12):26-26. 被引量：1
2谢璐宇,田夏彪.支架教学:小学整数运算教学质量提升策略探思[J].教育观察,2021(7):82-85. 被引量：1
3王泳腾,黄治昊,王俊,张童,崔国发.燕山山脉黄檗种群结构与动态特征[J].生态学报,2021,41(7):2826-2834. 被引量：32
4阳旭,胡松涛,王应华,杨万能,翟瑞芳.利用多时序激光点云数据提取棉花表型参数方法[J].智慧农业（中英文）,2021,3(1):51-62. 被引量：7
5曹闻亚,常红,赵洁,范凯婷,李旭颖,王秋华,严群,郭淑英.基于主成分分析法构建的住院老年脑卒中患者护理复杂度综合评价模型研究[J].现代临床护理,2021,20(4):1-7. 被引量：11

计算机工程与设计

2021年第6期

职称评审材料打包下载

基于FPGA的多核可扩展卷积加速器设计被引量：1

同被引文献3

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

基于FPGA的多核可扩展卷积加速器设计 被引量：1

同被引文献3

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

微信扫一扫：分享

基于FPGA的多核可扩展卷积加速器设计被引量：1