一种微指令序列调度数据流的星载卷积神经网络FPGA加速器被引量：1

An FPGA-Based Microinstruction Sequence Driven Spaceborne Convolution Neural Network Accelerator

下载PDF

导出

摘要卷积神经网络(Convolutional Neural Network,CNN)是目前主流视觉算法不可或缺的关键部分.为提高CNN模型推理速度,学界提出了众多异构加速方法以满足不同场景下的多元加速需求.但如何在资源与能耗受限的在轨卫星上稳定高效地加速CNN仍是极具挑战的课题.为此,本文通过软硬件协同设计,着力优化微指令编码、指令级并行和运算级并行3个加速器设计的关键部分,在星上常见的Xilinx VX690T FPGA芯片上设计实现了一种微指令序列调度数据流的CNN加速器.在软件层面,本文提出一种可扩展的微指令编码格式及相应的编译方法.通过卷积循环分块和算子融合策略实现图级别优化,生成加速器可执行的微指令序列.在硬件层面,本文设计实现了一个由微控制器与逻辑运算器组成的RTL级CNN加速器.微控制器通过粗粒度流水线实现各类指令的并行执行.逻辑运算器通过DSP48E1计算资源级联所构建的计算阵列实现卷积算子的细粒度并行运算.实验结果表明,加速器设计功耗10.68 W,在加速YOLOV3Tiny算法时,峰值吞吐率(Runtime Max Throughput,RMT)达到378.63 GOP/s,计算资源利用效率(MAC Efficiency,ME)达到91.5%.相较典型GPU加速方法,本文的加速器有14倍能效提升.相较同类FPGA加速器,ME有6.9%以上的提升. Recently,with the evolvement of space remote sensing technology,the main earth observation device has been gradually transitioning from the single-satellite to a constellation composed of light and small satellites.A constellation of several high-resolution satellites collects hundreds of TBs(Terabytes)of RSI(Remote Sensing Image)data every day.The traditional satellite-to-ground data transmission mechanism has been unable to match the massive remote sensing data processing.In-orbit satellites need to improve their data processing capabilities to deal with increasingly complex observation missions.Meanwhile,in the field of RSI processing,deep learning algorithms based on CNN(Convolutional Neural Network)have become the mainstream method due to their excellent performance.However,the computation-intensive and memory-intensive features have brought many challenges to the deployment of CNN.Academia and industry propose many specific acceleration methods for the CNN domain to cope with the various application scenarios.Numerous FPGA(Field Programmable Gate Array)and ASIC(Application Specific Integrated Circuit)accelerators have been designed to accelerate CNN in edge and data center scenarios.Compared with ASIC,FPGA has higher flexibility and faster development iteration speed,making it very suitable for spaceborne scenarios.In this paper,we propose a microinstruction driven CNN Accelerator for RSI processing on FPGA.This accelerator is jointly designed by software and hardware,which mainly optimizes microinstruction coding,instruction-level parallelism(Coarse-Grained Parallelism)and operation-level parallelism(Fine-Grained Parallelism)under the constraints of limited storage bandwidth and computing resources on satellites.At software level,we propose an extensible microinstruction encoding format and the corresponding compilation method(Micro Assembler).A microinstruction code covers 14 instructions in 4 types,which can schedule the dataflow between different components of the accelerator.The micro assembler performs graph-level optimization on the CNN topology by convolutional loop tiling and operator fusion,and then generates micro-instruction sequences that can be executed by the accelerator.At hardware level,we design and implement an RTL(Register Transfer Level)CNN accelerator,which is mainly composed of micro controller and logic operator.The micro controller achieves the parallel execution of different types of instruction by a 5-stage coarse-grained pipeline(Data Load,Data Fetch,Compute,Post Process,Write Back).The logic operator is a computing array with DSP48E1 hard core resources cascaded,which can achieve parallel execution of convolution operations by a 32-stage fine-grained pipeline.When the pipeline is established,the logic operator can complete 32×32 MAC(Multiply-accumulate)operations in one clock cycle.The performance of our proposed accelerator is evaluated on the Xilinx VX690T FPGA chip commonly found on satellites.The designed power consumption is 10.68 W.The RMT(Runtime Max Throughput)reaches 378.63 GOP/s,and the ME(MAC Efficiency)reaches 91.5%.When our accelerator is used as a coprocessor to accelerate the CNN object detection algorithm YOLOV3Tiny,the average accuracy of the RSI data set reaches 0.9 and the detection speed reaches 102 frames/s.The evaluation results show that our accelerator is 14 times more energy efficiency than the typical GPU acceleration method,and has more than 6.9%improvement in ME compared with other FPGA accelerators.

作者郭子博刘凯胡航天李奕铎璩泽旭 GUO Zi-Bo;LIU Kai;HU Hang-Tian;LI Yi-Duo;QU Ze-Xu(School of Computer Science and Technology,Xidian University,Xi’an 710000;CAST-Xi’an Institute of Space Radio Technology,Xi’an 710000)

机构地区西安电子科技大学计算机科学与技术学院中国空间技术研究院西安分院

出处《计算机学报》 EI CAS CSCD 北大核心 2022年第10期2047-2064,共18页 Chinese Journal of Computers

基金国家自然科学基金(62171342,61850410523) 空间测控通信创新探索基金(201701B)资助.

关键词卷积神经网络微指令序列现场可编程逻辑门阵列遥感目标检测微处理器设计 CNN microinstruction sequences FPGA remote sensing object detection microprocessor design

分类号 TP303 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献9

1王时雨,张盛兵,黄小平,常立博.星载SAR成像与智能处理的单片多处理架构[J].西北工业大学学报,2021,39(3):510-520. 被引量：6
2农元君,王俊杰.基于嵌入式的遥感目标实时检测方法[J].光学学报,2021,41(10):171-178. 被引量：15
3贺小军,李竺强,秦小宝,马经宇,江晟.吉林一号光谱卫星技术创新与应用成果[J].卫星应用,2020,28(3):18-26. 被引量：8
4李潭,沈娟,陈塞崎,唐梦辉.低轨遥感卫星平台长寿命制约因素研究[J].测绘通报,2014(S1):40-42. 被引量：2
5张砦,刘燕,黄莉莉.基于冷备份多模冗余结构的BRAM自修复方法[J].航空学报,2021,42(7):539-550. 被引量：3
6叶培建,黄江川,孙泽洲,杨孟飞,孟林智.中国月球探测器发展历程和经验初探[J].中国科学：技术科学,2014,44(6):543-558. 被引量：71
7吴艳霞,梁楷,刘颖,崔慧敏.深度学习FPGA加速器的进展与趋势[J].计算机学报,2019,42(11):2461-2480. 被引量：58
8王超,王腾,马翔,周学海.基于FPGA的机器学习硬件加速研究进展[J].计算机学报,2020,43(6):1161-1182. 被引量：15
9顾义坤,倪风雷,刘宏.Xilinx FPGA自主配置管理容错设计研究[J].宇航学报,2012,33(10):1519-1527. 被引量：7

二级参考文献55

1邢克飞,杨俊,王跃科,肖争鸣,周永彬.Xilinx SRAM型FPGA抗辐射设计技术研究[J].宇航学报,2007,28(1):123-129. 被引量：51
2Cheatham J A, Emmert J M, Baumgart S. A survey of fault tolerant methodologies for FPGAs [ J ]. ACM Transactions on Design Automation of Electronic Systeurs, 2006, 11 (2) : 501 -533.
3Doumar A, Ito H. Detecting, diagnosing, and tolerating faults in SRAM based field programmable gate arrays: a survey[ J ]. IEEE Transactions on Very Large Scale Integration (VLS|) Systems, 2003, 11(3) : 386 -405.
41-1uang W J, McCluskey E J. Column-based precompiled configuration techniques for FPGA [ C ]. 'lhe 9th Annual IEEE Symposium on Field- Programmable Custom Computing Machines, Rohnert Park, USA, March 29-April 2, 2001.
5Dutt S, Shanmugavel V, Trimberger S. Efficient incrementalrerouting for fault reconfiguration in field programmable gate arrays [ C ]. The 1999 IEEE/ACM international conference on Computer- aided desigt. San Jose, USA, November 7 - 11, 1999.
6Keymeulen D, Zebulum R S, Jin Y, et al. Fault-tolerant evolvable hardware using field-programmable transistor arrays J ]. IEEE Transactions on Reliability, 2000, 49(3): 305 -315.
7Oreifej R S, A1 - Haddad R N, Tan H, eta|. Layered approach to intrinsic evolvable hardware using direct bitstream manipulation of Virtex-II Pro device[ C]. The 17th International Conference on Field Programmable Logic and Applications. Amsterdam, Netherlands, August 27 - 29, 2007.
8Ross R, Hall R. A FPGA simulation using asexual genetic algorithms for integrated self-repair [ C ]. The First NASA/ESA Conference on Adaptive Hardware and Systems. Istanbul, Turkey, June 15 - 18, 2006.
9Xilinx Virtex-II Platform FPGA User Guide [ M/OL] USA: Xilinx company, 2003 [ 2011 - 2 - 11 ]. http://www, xilinx. corn/support/documentation/index, htm.
10Swift G M, Rezgui S, George J, et al. Dynamic testing of Xilinx Virtex-II field programmable gate array (FPGA) input/output blocks (IOBs) [ J ]. IEEE Transactions on Nuclear Science, 2004, 51 (6): 3469- 3474.

共引文献172

1曹燕燕,傅丽佳,王卫军,刘仲,刘洲.样品转移机构的末端精度优化方法及在轨应用[J].上海航天（中英文）,2022,39(S01):61-66.
2杜忠文,李庚霖,蒋菡,褚江恒,伍俊.基于次级缓存的SDRAM调度策略的研究[J].电子测量技术,2023,46(14):37-42. 被引量：1
3张舰.父亲(外一首)[J].岁月,2000(7):60-60.
4丁明亮.检错纠错技术在导弹伺服机构中的实现[J].国外电子测量技术,2013,32(12):84-86. 被引量：2
5杨孟飞,张高,张伍,彭兢,王勇,王晓磊,陈春亮,杜颖,张正峰.探月三期月地高速再入返回飞行器技术设计与实现[J].中国科学：技术科学,2015,45(2):111-123. 被引量：50
6刘治钢,蔡晓东,夏宁,陈铤,王保平,苏若曦,余波,杜青,陈燕,李杨威.月地高速再入返回飞行器供配电系统设计与实现[J].中国科学：技术科学,2015,45(2):167-175. 被引量：2
7顾征,邹昕,陈丽平,王彤,薛博,曹瑞强,邹乐洋,赵洋.探月三期月地高速再入返回飞行器工程参数测量系统设计及飞行结果[J].中国科学：技术科学,2015,45(2):176-184. 被引量：4
8于萍,王鹏基,胡洪凯,钟红军,于国庆,张洪华.飞行器GNC试验子系统新技术及验证[J].中国科学：技术科学,2015,45(2):193-197. 被引量：3
9王刚,董彦芝,杨昌昊,刘峰,马彬,张萃,田鑫,张正峰,逯运通.适应斜落着陆冲击的一体化结构设计及验证技术[J].中国科学：技术科学,2015,45(2):204-212. 被引量：2
10李晓锋,程莉,蔡雨辰,张国峰,关小川.基于集成数字化的月地高速再入返回飞行器飞控支持系统[J].中国科学：技术科学,2015,45(3):284-292. 被引量：2

同被引文献6

1吴艳霞,梁楷,刘颖,崔慧敏.深度学习FPGA加速器的进展与趋势[J].计算机学报,2019,42(11):2461-2480. 被引量：58
2彭宇,姬森展,于希明,刘胜剑.语义分割网络的FPGA加速计算方法综述[J].仪器仪表学报,2021,42(9):1-12. 被引量：17
3郭子博,高瑛珂,胡航天,弓铎,刘凯,吴宪云.基于混合架构的卷积神经网络算法加速研究[J].计算机工程与应用,2022,58(6):88-94. 被引量：4
4王元乐,杨玉辰,方火能,邵应昭,张朗,李晓博,梁峰.一种可重构的星载高性能智能异构计算系统[J].空间控制技术与应用,2022,48(5):125-132. 被引量：9
5王琳毅,白静,李文静,蒋金哲.YOLO系列目标检测算法研究进展[J].计算机工程与应用,2023,59(14):15-29. 被引量：34
6蒋康宁,周海,卞春江,汪伶.基于宇航级FPGA的YOLOv5s网络模型硬件加速[J].空间科学学报,2023,43(5):950-962. 被引量：1

引证文献1

1张雨豪,叶有时,彭宇,张德正,阎之泓,王东.一种基于FPGA的深度神经网络硬件加速器系统[J].空间控制技术与应用,2024,50(2):83-92.

1郑静雅,安军社.SpaceFibre网络服务质量时隙资源分配算法[J].国防科技大学学报,2021,43(2):74-83. 被引量：2
2王鹏.工业环境下的人机交互状态数据仿真研究[J].计算机仿真,2021,38(1):400-403.
3鲁蔚征,张峰,贺寅烜,陈跃国,翟季冬,杜小勇.华为昇腾神经网络加速器性能评测与优化[J].计算机学报,2022,45(8):1618-1637. 被引量：5
4杨国,黄文静,朱洪前,丁键,任会,李丹,肖恒玉,胡涛.自然环境下黄绿柑橘检测通用模型的构建[J].林业工程学报,2022,7(5):134-141. 被引量：2
5马海宁,何鑫,陈竞竞,汪卉.卷积神经网络在滚动轴承故障诊断中的应用[J].计算机应用文摘,2022,38(17):40-42.
6宋新开,支天,孔维浩,杜子东.针对图神经网络加速器性能评估的标准测试集[J].高技术通讯,2022,32(7):663-673.
7邓慧,崔亚飞.基于Faster R-CNN的铝型材表面缺陷识别研究[J].济源职业技术学院学报,2022,21(3):59-62.
8赵鑫,孟令军,刘威宏.基于Zynq平台的垃圾分类系统实现[J].工业仪表与自动化装置,2022(5):26-31. 被引量：2
9吝毅,陈维刚,朱天航.自适应快速锁定全数字锁相环设计[J].信息技术与信息化,2022(8):112-115. 被引量：2
10陈家乐,董雪莲,方慧桧,谭静元.基于多线程技术的ONNX音频分离模型并发推理的设计与优化[J].现代信息科技,2022,6(14):71-74. 被引量：1

计算机学报

2022年第10期

浏览历史

内容加载中请稍等...

一种微指令序列调度数据流的星载卷积神经网络FPGA加速器被引量：1

参考文献9

二级参考文献55

共引文献172

同被引文献6

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种微指令序列调度数据流的星载卷积神经网络FPGA加速器 被引量：1

参考文献9

二级参考文献55

共引文献172

同被引文献6

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种微指令序列调度数据流的星载卷积神经网络FPGA加速器被引量：1