摘要
针对卷积神经网络在极致边缘计算(UEC)场景应用中的性能和功耗需求,该文针对场景中16 Bit量化位宽的网络模型提出一种不依赖外部存储的卷积神经网络(CNN)加速器架构,该架构基本结构设计为基于现场可编程逻辑门阵列(FPGA)的多核CNN全流水加速器。在此基础上,实现了该加速器的层内映射与层间融合优化。然后,通过构建资源评估模型在理论上完成架构中的计算资源与存储资源评估,并在该理论模型指导下,通过设计空间探索来最大化资源使用率与计算效率,进而充分挖掘加速器在计算资源约束条件下的峰值算力。最后,以纳型无人机(UAV)自主快速人体检测UEC场景为例,通过实验完成了加速器架构性能验证与分析。结果表明,在实现基于单步多框目标检测(SSD)的人体检测神经网络推理中,加速器在100 MHz和25 MHz主频下分别实现了帧率为137和34的推理速度,对应功耗分别为0.514 W和0.263 W,满足纳型无人机自主计算这种典型UEC场景对图像实时处理的性能与功耗需求。
In order to meet the requirements of performance and power in Ultimate Edge Computing(UEC)scenario,a Convolutional Neural Network(CNN)accelerator architecture is proposed with 16 Bit quantization model that does not rely on external memory.The basic structure of proposed architecture is Field Programmable Gate Array(FPGA)with multi-core CNN full pipeline accelerator.On this basis,the optimization of intra-layer mapping and inter-layer fusion of accelerator is realized.Then,the evaluation of computing resource and memory resource are theoretically completed by building the corresponding model.Under the guidance of this model,the resource utilization and computing efficiency are maximized through design space exploration,and the peak computing power of accelerator is fully exploited with limited resource constraint.Finally,taking fast human detection of nano Unmanned Aerial Vehicle(UAV)as an example,the verification and analysis of architecture are completed through experiments.Experimental results show that in the inference of human body detection neural network based on Single Shot multibox Detector(SSD),the performance is achieved with the speed of frame rate 137 and 34 at 100 MHz and 25 MHz,and the corresponding power is 0.514 W and 0.263 W,respectively,which meets the performance and power requirements of real-time image processing in typical UEC scenarios such as autonomous computing of nano-UAV.
作者
吴瑞东
刘冰
付平
纪兴龙
鲁文帅
WU Ruidong;LIU Bing;FU Ping;JI Xinglong;LU Wenshuai(School of Electronics and Information Engineering,Harbin Institute of Technology,Harbin 150000,China;Qiyuan Laboratory,Beijing 100089,China)
出处
《电子与信息学报》
EI
CSCD
北大核心
2023年第6期1933-1943,共11页
Journal of Electronics & Information Technology
基金
国家自然科学基金(62171156)。
关键词
极致边缘计算
卷积神经网络
现场可编程逻辑门阵列
加速器架构
Ultimate Edge Computing(UEC)
Convolutional Neural Network(CNN)
Field Programmable Gate Array(FPGA)
Accelerator architecture