面向边缘计算的可重构CNN协处理器研究与设计

A Research and Design of Reconfigurable CNN Co-Processor for Edge Computing

下载PDF

导出

摘要随着深度学习技术的发展,卷积神经网络模型的参数量和计算量急剧增加,极大提高了卷积神经网络算法在边缘侧设备的部署成本。因此,为了降低卷积神经网络算法在边缘侧设备上的部署难度,减小推理时延和能耗开销,该文提出一种面向边缘计算的可重构CNN协处理器结构。基于按通道处理的数据流模式,提出的两级分布式存储方案解决了片上大规模的数据搬移和重构运算时PE单元间的大量数据移动导致的功耗开销和性能下降的问题;为了避免加速阵列中复杂的数据互联网络传播机制,降低控制的复杂度,该文提出一种灵活的本地访存机制和基于地址转换的填充机制,使得协处理器能够灵活实现任意规格的常规卷积、深度可分离卷积、池化和全连接运算,提升了硬件架构的灵活性。本文提出的协处理器包含256个PE运算单元和176 kB的片上私有存储器,在55 nm TT Corner(25°C,1.2 V)的CMOS工艺下进行逻辑综合和布局布线,最高时钟频率能够达到328 MHz,实现面积为4.41 mm^(2)。在320 MHz的工作频率下,该协处理器峰值运算性能为163.8 GOPs,面积效率为37.14GOPs/mm^(2),完成LeNet-5和MobileNet网络的能效分别为210.7 GOPs/W和340.08 GOPs/W,能够满足边缘智能计算场景下的能效和性能需求。 With the development of Deep Learning,the number of parameters and computation of Convolutional Neural Network(CNN)increases dramatically,which greatly raises the cost of deploying CNN algorithms on edge devices.To reduce the difficulty of the deployment and decrease the inference latency and energy consumption of CNN on the edge side,a Reconfigurable CNN Co-Processor for edge computing is proposed.Based on the data flow pattern of channel-wise processing,the proposed two-level distributed storage scheme solves the problem of power consumption overhead and performance degradation caused by large data movement between PE units and large-scale migration of intermediate data on chip.To avoid the complex data interconnection network propagation mechanism in PE arrays and reduce the complexity of control,a flexible local access mechanism and a padding mechanism based on address translation are proposed,which can perform conventional convolution,deep separable convolution,pooling and fully connected operations with great flexibility.The proposed co-processor contains 256 Processing Elements(PEs)and 176 kB on-chip SRAM.Synthesized and post-layout with 55-nm TT Corner CMOS process(25°C,1.2 V),the CNN co-processor achieves a maximum clock frequency of 328 MHz and an area of 4.41 mm^(2).The peak performance of the co-processor is 163.8 GOPs at 320 MHz and the area efficiency is 37.14 GOPs/mm^(2),the energy efficiency of LeNet-5 and MobileNet are 210.7 GOPs/W and 340.08 GOPs/W,respectively,which is able to meet the energy-efficiency and performance requirements of edge intelligent computing scenarios.

作者李伟陈億陈韬南龙梅杜怡然 LI Wei;CHEN Yi;CHEN Tao;NAN Longmei;DU Yiran(School of Cryptographic Engineering,Strategic Support Force Information Engineering University,Zhengzhou 450001,China)

机构地区战略支援部队信息工程大学密码工程学院

出处《电子与信息学报》 EI CAS CSCD 北大核心 2024年第4期1499-1512,共14页 Journal of Electronics & Information Technology

基金基础加强计划重点基础研究项目(2019-JCJQ-ZD-187-00-02)。

关键词硬件加速卷积神经网络可重构 ASIC Hardware acceleration Convolutional Neural Network(CNN) Reconfigurable ASIC

分类号 TN492 [电子电信—微电子学与固体电子学] TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1唱晨焱,张赛男.高校舆情网络传播机制与引导对策研究[J].新闻世界,2024(1):113-116. 被引量：1
2韩萱怡.网络场域培育中华民族共同体意识的路径研究[J].新闻研究导刊,2023,14(19):80-82.
3张曼,姚健康,李洪涛,董科军,延志伟.DNS信道传输加密技术:现状、趋势和挑战[J].软件学报,2024,35(1):309-332.
4乔建华,吴言,栗亚宁,雷光政.面向微控制器的卷积神经网络加速器设计[J].电子器件,2024,47(1):48-54.
5李欣,保利勇,丁洪伟,官铮.基于MEC服务器优先服务的路侧单元MAC层调度策略[J].计算机应用,2024,44(4):1227-1235.

电子与信息学报

2024年第4期

浏览历史

内容加载中请稍等...

面向边缘计算的可重构CNN协处理器研究与设计

相关作者

相关机构

相关主题

浏览历史