期刊文献+

面向训练阶段的神经网络性能分析 被引量:1

Analyzing Performance of Neural Networks in Training Phase
下载PDF
导出
摘要 最近,神经网络被广泛应用到许多领域。然而,随着神经网络模型越来越复杂,图形处理单元(graphics processing unit,GPU)被应用到深度学习中。GPU在加速矩阵计算方面展现出了卓越的性能,但是多样和复杂的神经网络模型导致网络训练阶段GPU的计算资源和显存并没有充分利用。对神经网络训练阶段进行细粒度的性能分析。首先从数据流的角度把训练过程分解为6个阶段,并测试每个阶段的延时;然后从GPU加速库、神经网络模型和批次三方面量化分析每一层的GPU计算效率和资源利用率;最后分析每层的参数和特征图的显存占用情况。实验发现:(1)cuDNN库卷积的计算效率是cuBLAS库的2倍。(2)卷积层的资源利用率比全连接层高50%。(3)不同层的显存利用率差异很大,整体利用率不高,最大不超过显存的20%。 Recently,the neural networks have increasingly delopyed in many fields.However,as complexity of neural networks grows,graphics processing units(GPUs)begin to be applied in deep learning.Though GPUs have exhibited excellent performance on accelerating matrix multiplication,the real computing resources and memory resources of GPUs have not been fully utilized in the compute-intensive neural network training phase due to the complexity and diversity of network models.This paper focuses on doing an experimental and fine-grained performance analysis for deep neural network models.First,it divides the training phase into six stages in the sight of data flow and measures the latency of each stage.And then,it presents a quantitative analysis for GPU compute efficiency and resource utilization in each layer from point of views of GPU-accelerated libraries,neural network models,and batch size.Finally,weights and feature maps of each layer are given quantitatively to reveal the GPU memory utilization.These experiments and analysis show that(1)The compute efficiency of cuDNN in convolution layers is 2 times than cuBLAS.(2)The resource utilization of convolution layers is 50%higher than full-connected layers.(3)The GPU memory utilization in different layers are varied,and the overall utilization is not high,no more than 20%of the total memory space.
作者 李景军 张宸 曹强 LI Jingjun;ZHANG Chen;CAO Qiang(Key Laboratory of Information Storage System,Ministry of Education of China.Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology,Wuhan 430074,China)
出处 《计算机科学与探索》 CSCD 北大核心 2018年第10期1645-1657,共13页 Journal of Frontiers of Computer Science and Technology
关键词 网络模型 图形处理单元(GPU) 资源利用率 计算效率 数据流 GPU加速库 network models graphics processing unit(GPU) resource utilization compute efficiency data flow GPU-accelerated library
  • 相关文献

同被引文献1

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部