摘要
最近,神经网络被广泛应用到许多领域。然而,随着神经网络模型越来越复杂,图形处理单元(graphics processing unit,GPU)被应用到深度学习中。GPU在加速矩阵计算方面展现出了卓越的性能,但是多样和复杂的神经网络模型导致网络训练阶段GPU的计算资源和显存并没有充分利用。对神经网络训练阶段进行细粒度的性能分析。首先从数据流的角度把训练过程分解为6个阶段,并测试每个阶段的延时;然后从GPU加速库、神经网络模型和批次三方面量化分析每一层的GPU计算效率和资源利用率;最后分析每层的参数和特征图的显存占用情况。实验发现:(1)cuDNN库卷积的计算效率是cuBLAS库的2倍。(2)卷积层的资源利用率比全连接层高50%。(3)不同层的显存利用率差异很大,整体利用率不高,最大不超过显存的20%。
Recently,the neural networks have increasingly delopyed in many fields.However,as complexity of neural networks grows,graphics processing units(GPUs)begin to be applied in deep learning.Though GPUs have exhibited excellent performance on accelerating matrix multiplication,the real computing resources and memory resources of GPUs have not been fully utilized in the compute-intensive neural network training phase due to the complexity and diversity of network models.This paper focuses on doing an experimental and fine-grained performance analysis for deep neural network models.First,it divides the training phase into six stages in the sight of data flow and measures the latency of each stage.And then,it presents a quantitative analysis for GPU compute efficiency and resource utilization in each layer from point of views of GPU-accelerated libraries,neural network models,and batch size.Finally,weights and feature maps of each layer are given quantitatively to reveal the GPU memory utilization.These experiments and analysis show that(1)The compute efficiency of cuDNN in convolution layers is 2 times than cuBLAS.(2)The resource utilization of convolution layers is 50%higher than full-connected layers.(3)The GPU memory utilization in different layers are varied,and the overall utilization is not high,no more than 20%of the total memory space.
作者
李景军
张宸
曹强
LI Jingjun;ZHANG Chen;CAO Qiang(Key Laboratory of Information Storage System,Ministry of Education of China.Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology,Wuhan 430074,China)
出处
《计算机科学与探索》
CSCD
北大核心
2018年第10期1645-1657,共13页
Journal of Frontiers of Computer Science and Technology