面向稀疏卷积神经网络的CGRA加速器研究

The research of CGRA accelerator for sparse convolutional neural networks

下载PDF

导出

摘要本文针对规模日益增长和演变迅速的稀疏卷积神经网络(CNN)应用,提出一款高能效且灵活的加速结构DyCNN来提升其性能和能效。DyCNN基于兼具灵活性和高能效的粗粒度可重构架构(CGRA)设计,可以利用其指令的高并行性来高效支持CNN的操作。DyCNN使用基于数据感知的指令动态过滤机制来滤除各计算单元中由于稀疏CNN中权值静态稀疏性和激活值动态稀疏性产生的大量无效计算和访存指令,使它们能像执行稠密网络一样高效复用一组指令。此外DyCNN利用基于负载感知的动静结合负载调度策略解决了稀疏导致的负载不均衡问题。实验结果表明,DyCNN运行稀疏CNN与运行密集CNN相比实现了平均1.69倍性能提升和3.04倍能效提升,比先进的GPU(cuSPARSE)和Cambricon-X上的解决方案分别实现了2.78倍、1.48倍性能提升和35.62倍、1.17倍能效提升。 A novel accelerator named DyCNN is proposed for sparse convolutional neural network(CNN) that has the increasing scale and rapid evolution.DyCNN is an energy-efficient and flexible accelerator,which is based on coarse grained reconfigurable architecture(CGRA).DyCNN utilizes a data-aware dynamic filtering mechanism to eliminatea large number of invalid calculations and memory accesses caused by the static sparsity of filters and dynamic sparsity of activation values in sparse convolutional neural network and increase the on-chip reuse of instructions among processing units.Meanwhile,a dynamic work-stealing strategy combined with a static work distribution scheme is proposed to alleviate the load imbalance caused by the sparsity of filter and activation values.Overall,DyCNN achieves a 1.69 × speedup and 3.04 × energy savings on average when running sparse CNN compared with running dense CNN.Dy CNN achieves 2.78 ×,1.48 × speedup and 35.62 × and 1.17 × energy savings compared with the state-of-the-art GPU(cuSPARSE) and Cambricon-X solutions respectively.

作者谭龙严明玉吴欣欣李文明吴海彬范东睿 TAN Long;YAN Mingyu;WU Xinxin;LI Wenming;WU Haibin;FAN Dongrui(State Key Laboratory of Processor,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)

机构地区中国科学院计算技术研究所处理器国家重点实验室中国科学院大学

出处《高技术通讯》 CAS 北大核心 2024年第2期173-186,共14页 Chinese High Technology Letters

基金国家自然科学基金(62202451) 中国科学院青年基础研究(YSBR-029) 中国科学院青年创新促进会项目资助。

关键词稀疏卷积神经网络(CNN) 专用加速结构粗粒度可重构架构(CGRA) 动态指令过滤动态负载调度 sparse convolutional neural network(CNN) dedicated accelerator coarse-grained reconfigu-rable architecture(CGRA) dynamic instruction filtering dynamic workload balance

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献1

1Kota Ando,Shinya Takamaeda-Yamazaki,Masayuki Ikebe,Tetsuya Asai,Masato Motomura.A Multithreaded CGRA for Convolutional Neural Network Processing[J].Circuits and Systems,2017,8(6):149-170. 被引量：1

1黄林伟.低功耗无线网络体系动态负载均衡构架模型[J].计算机与数字工程,2020,48(2):412-416.
2李媛,马浩轩.国内外金融加速器研究综述与展望——基于CiteSpace可视化分析[J].管理现代化,2024,44(1):190-200.
3董雪,孙青.一个柜子就能给分子拍照片[J].瞭望,2024(4):21-23.
4陈如旭.多租户云环境中的虚拟机迁移策略研究[J].通信电源技术,2023,40(24):277-280.

高技术通讯

2024年第2期

浏览历史

内容加载中请稍等...

面向稀疏卷积神经网络的CGRA加速器研究

参考文献1

相关作者

相关机构

相关主题

浏览历史