期刊文献+

面向边缘计算的可重构CNN协处理器研究与设计

A Research and Design of Reconfigurable CNN Co-Processor for Edge Computing
下载PDF
导出
摘要 随着深度学习技术的发展,卷积神经网络模型的参数量和计算量急剧增加,极大提高了卷积神经网络算法在边缘侧设备的部署成本。因此,为了降低卷积神经网络算法在边缘侧设备上的部署难度,减小推理时延和能耗开销,该文提出一种面向边缘计算的可重构CNN协处理器结构。基于按通道处理的数据流模式,提出的两级分布式存储方案解决了片上大规模的数据搬移和重构运算时PE单元间的大量数据移动导致的功耗开销和性能下降的问题;为了避免加速阵列中复杂的数据互联网络传播机制,降低控制的复杂度,该文提出一种灵活的本地访存机制和基于地址转换的填充机制,使得协处理器能够灵活实现任意规格的常规卷积、深度可分离卷积、池化和全连接运算,提升了硬件架构的灵活性。本文提出的协处理器包含256个PE运算单元和176 kB的片上私有存储器,在55 nm TT Corner(25°C,1.2 V)的CMOS工艺下进行逻辑综合和布局布线,最高时钟频率能够达到328 MHz,实现面积为4.41 mm^(2)。在320 MHz的工作频率下,该协处理器峰值运算性能为163.8 GOPs,面积效率为37.14GOPs/mm^(2),完成LeNet-5和MobileNet网络的能效分别为210.7 GOPs/W和340.08 GOPs/W,能够满足边缘智能计算场景下的能效和性能需求。 With the development of Deep Learning,the number of parameters and computation of Convolutional Neural Network(CNN)increases dramatically,which greatly raises the cost of deploying CNN algorithms on edge devices.To reduce the difficulty of the deployment and decrease the inference latency and energy consumption of CNN on the edge side,a Reconfigurable CNN Co-Processor for edge computing is proposed.Based on the data flow pattern of channel-wise processing,the proposed two-level distributed storage scheme solves the problem of power consumption overhead and performance degradation caused by large data movement between PE units and large-scale migration of intermediate data on chip.To avoid the complex data interconnection network propagation mechanism in PE arrays and reduce the complexity of control,a flexible local access mechanism and a padding mechanism based on address translation are proposed,which can perform conventional convolution,deep separable convolution,pooling and fully connected operations with great flexibility.The proposed co-processor contains 256 Processing Elements(PEs)and 176 kB on-chip SRAM.Synthesized and post-layout with 55-nm TT Corner CMOS process(25°C,1.2 V),the CNN co-processor achieves a maximum clock frequency of 328 MHz and an area of 4.41 mm^(2).The peak performance of the co-processor is 163.8 GOPs at 320 MHz and the area efficiency is 37.14 GOPs/mm^(2),the energy efficiency of LeNet-5 and MobileNet are 210.7 GOPs/W and 340.08 GOPs/W,respectively,which is able to meet the energy-efficiency and performance requirements of edge intelligent computing scenarios.
作者 李伟 陈億 陈韬 南龙梅 杜怡然 LI Wei;CHEN Yi;CHEN Tao;NAN Longmei;DU Yiran(School of Cryptographic Engineering,Strategic Support Force Information Engineering University,Zhengzhou 450001,China)
出处 《电子与信息学报》 EI CAS CSCD 北大核心 2024年第4期1499-1512,共14页 Journal of Electronics & Information Technology
基金 基础加强计划重点基础研究项目(2019-JCJQ-ZD-187-00-02)。
关键词 硬件加速 卷积神经网络 可重构 ASIC Hardware acceleration Convolutional Neural Network(CNN) Reconfigurable ASIC
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部