摘要
近年来,随着深度神经网络在各领域的广泛应用,针对不同的应用场景,都需要对神经网络模型进行训练以获得更优的参数,于是对训练速度的需求不断提升。然而,现有的研究通常只关注了计算密集型层的加速,忽略了访存密集型层的加速。访存密集型层的操作主要由访存带宽决定执行效率,单独提升运算速度对性能影响不大。本文从执行顺序的角度出发,提出了将访存密集型层与其前后的计算密集型层融合为一个新层执行的方式,将访存密集型层的操作作为对融合新层中输入数据的前处理或输出数据的后处理进行,大幅减少了访存密集型层在训练过程中对片外内存的访问,提升了性能;并针对该融合执行方案,设计实现了一个面向训练的加速器,采用了暂存前处理结果、后处理操作与计算密集型层操作并行执行的优化策略,进一步提升了融合新层的训练性能。实验结果显示,在面积增加6.4%、功耗增加10.3%的开销下,训练的前向阶段、反向阶段的性能分别实现了67.7%、77.6%的提升。
Recently,deep neural networks are widely used in various fields.It is necessary to train each neural network model to get better model parameters for different application scenarios.Thus,the demand for training speed is increasing.However,the existing research usually focuses on the acceleration of computation intensive layers but ignores the acceleration of memory intensive layers.The efficiency of the memory intensive layer is mainly determined by the memory bandwidth,thus only improving the computation speed has little effect on the layer’s performance.From the perspective of execution order,this paper proposes a method of fusing the memory intensive layer and the computation intensive layer before it or behind it into a new fused layer,and the operation of memory intensive layer is performed as the pre-processing of the input data or the post-processing of the origin output data in the fused layer.Thus,it reduces the access of the memory intensive layer to off-chip memory greatly during training.Based on the fusion method,a new accelerator for training is implemented,adopting the optimization strategy to further improve the training performance,including temporarily storing the pre-processing results and concurrently executing the operations of post-processing and the computation intensive layer.The experimental results show that the performance is improved by 67.7%and 77.6%respectively in the forward propagation stage and backward propagation stage of training at the cost of 6.4%increase in area and 10.3%increase in power.
作者
杨灿
王重熙
章隆兵
YANG Can;WANG Chongxi;ZHANG Longbing(State Key Laboratory of Processors,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
出处
《高技术通讯》
CAS
2023年第8期823-835,共13页
Chinese High Technology Letters
基金
中国科学院战略性先导科技专项(XDC05020100)资助项目。