基于层间融合的神经网络访存密集型层加速被引量：2

Accelerating memory intensive layer of neural networks with layer fusion

下载PDF

导出

摘要近年来,随着深度神经网络在各领域的广泛应用,针对不同的应用场景,都需要对神经网络模型进行训练以获得更优的参数,于是对训练速度的需求不断提升。然而,现有的研究通常只关注了计算密集型层的加速,忽略了访存密集型层的加速。访存密集型层的操作主要由访存带宽决定执行效率,单独提升运算速度对性能影响不大。本文从执行顺序的角度出发,提出了将访存密集型层与其前后的计算密集型层融合为一个新层执行的方式,将访存密集型层的操作作为对融合新层中输入数据的前处理或输出数据的后处理进行,大幅减少了访存密集型层在训练过程中对片外内存的访问,提升了性能;并针对该融合执行方案,设计实现了一个面向训练的加速器,采用了暂存前处理结果、后处理操作与计算密集型层操作并行执行的优化策略,进一步提升了融合新层的训练性能。实验结果显示,在面积增加6.4%、功耗增加10.3%的开销下,训练的前向阶段、反向阶段的性能分别实现了67.7%、77.6%的提升。 Recently,deep neural networks are widely used in various fields.It is necessary to train each neural network model to get better model parameters for different application scenarios.Thus,the demand for training speed is increasing.However,the existing research usually focuses on the acceleration of computation intensive layers but ignores the acceleration of memory intensive layers.The efficiency of the memory intensive layer is mainly determined by the memory bandwidth,thus only improving the computation speed has little effect on the layer’s performance.From the perspective of execution order,this paper proposes a method of fusing the memory intensive layer and the computation intensive layer before it or behind it into a new fused layer,and the operation of memory intensive layer is performed as the pre-processing of the input data or the post-processing of the origin output data in the fused layer.Thus,it reduces the access of the memory intensive layer to off-chip memory greatly during training.Based on the fusion method,a new accelerator for training is implemented,adopting the optimization strategy to further improve the training performance,including temporarily storing the pre-processing results and concurrently executing the operations of post-processing and the computation intensive layer.The experimental results show that the performance is improved by 67.7%and 77.6%respectively in the forward propagation stage and backward propagation stage of training at the cost of 6.4%increase in area and 10.3%increase in power.

作者杨灿王重熙章隆兵 YANG Can;WANG Chongxi;ZHANG Longbing(State Key Laboratory of Processors,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)

机构地区处理器芯片国家重点实验室(中国科学院计算技术研究所) 中国科学院计算技术研究所中国科学院大学

出处《高技术通讯》 CAS 2023年第8期823-835,共13页 Chinese High Technology Letters

基金中国科学院战略性先导科技专项(XDC05020100)资助项目。

关键词神经网络训练加速器卷积神经网络(CNN) 访存密集型层批归一化(BN)层 neural network training accelerator convolutional neural network(CNN) memory intensive layer batch normalization(BN)layer

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1周聖元,杜子东,陈云霁.稀疏神经网络加速器设计[J].高技术通讯,2019,29(3):222-231. 被引量：5

共引文献4

1狄新凯,杨海钢.基于FPGA的稀疏化卷积神经网络加速器[J].计算机工程,2021,47(7):189-195. 被引量：4
2齐豪,刘少礼,李威.深度卷积的软硬件协同优化设计与实现[J].高技术通讯,2022,32(7):696-707.
3李文青,齐寒,肖子原,朱威浦,王剑.紧耦合异构线程处理器[J].高技术通讯,2023,33(2):113-123.
4闫家均,熊兴中,黄见,周义卓,邵子扬.基于拟神经突触的概率计算神经网络加速器[J].电子设计工程,2024,32(12):10-16.

同被引文献50

1冯江,李涛,朱哲,王东升,姚重华.絮体性状实时图像检测系统的开发与应用[J].环境污染与防治,2008,30(10):40-43. 被引量：4
2范军丽,周敏,朱振军.基于Qt Creator的充电桩图形界面开发的研究[J].电子质量,2012(7):7-8. 被引量：1
3董虎胜,徐建峰,孙浩,吴铭仪.基于注意力机制引导的图像描述生成[J].现代计算机,2019,0(30):30-33. 被引量：1
4杜航,张涛,陈岩,赵银明.基于遗传算法的BP神经网络预测石油单井产量[J].成都大学学报（自然科学版）,2021,40(1):57-61. 被引量：9
5吴军,石改琴,卢帅员,李阔,桂烨涵,吴东泽,李鑫.采用无人机视觉的飞机蒙皮损伤智能检测方法[J].中国测试,2021,47(11):119-126. 被引量：6
6赵卫东,施实伟,周婵.基于ImageNet预训练卷积神经网络的图像风格迁移[J].成都大学学报（自然科学版）,2021,40(4):367-373. 被引量：7
7代婷婷.基于改进智能算法的机器人路径规划问题研究[J].成都大学学报（自然科学版）,2021,40(4):379-383. 被引量：4
8曹轲,谭冲,刘洪,郑敏.基于改进灰狼算法优化BP神经网络的无线传感器网络数据融合算法[J].中国科学院大学学报（中英文）,2022,39(2):232-239. 被引量：28
9裴红蕾.基于综合信息融合神经网络的轴承故障诊断[J].机电工程,2022,39(4):495-500. 被引量：4
10王冠博,杨俊东,李波,保利勇,丁洪伟.改进YOLOv4的火焰图像实时检测[J].计算机工程与设计,2022,43(5):1358-1365. 被引量：7

引证文献2

1张德银,黄少晗,赵志恒,李俊佟,张裕尧.基于融合神经网络的飞机蒙皮缺陷检测的研究[J].成都大学学报（自然科学版）,2023,42(4):365-371. 被引量：1
2陈冠霖,卢周显,过轩宇,刘佳和,胡博宇.面向无人机在线巡检的绝缘子缺陷识别系统设计及实现[J].广西电力,2024,47(2):35-43.

二级引证文献1

1陈江萍,潘虹,王玉芳.基于人工智能的皮革材料缺陷检测与识别方法研究[J].中国皮革,2024,53(5):51-54.

高技术通讯

2023年第8期

浏览历史

内容加载中请稍等...

基于层间融合的神经网络访存密集型层加速被引量：2

参考文献1

共引文献4

同被引文献50

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于层间融合的神经网络访存密集型层加速 被引量：2

参考文献1

共引文献4

同被引文献50

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于层间融合的神经网络访存密集型层加速被引量：2