面向训练阶段的神经网络性能分析被引量：1

Analyzing Performance of Neural Networks in Training Phase

下载PDF

导出

摘要最近,神经网络被广泛应用到许多领域。然而,随着神经网络模型越来越复杂,图形处理单元(graphics processing unit,GPU)被应用到深度学习中。GPU在加速矩阵计算方面展现出了卓越的性能,但是多样和复杂的神经网络模型导致网络训练阶段GPU的计算资源和显存并没有充分利用。对神经网络训练阶段进行细粒度的性能分析。首先从数据流的角度把训练过程分解为6个阶段,并测试每个阶段的延时;然后从GPU加速库、神经网络模型和批次三方面量化分析每一层的GPU计算效率和资源利用率;最后分析每层的参数和特征图的显存占用情况。实验发现:(1)cuDNN库卷积的计算效率是cuBLAS库的2倍。(2)卷积层的资源利用率比全连接层高50%。(3)不同层的显存利用率差异很大,整体利用率不高,最大不超过显存的20%。 Recently,the neural networks have increasingly delopyed in many fields.However,as complexity of neural networks grows,graphics processing units(GPUs)begin to be applied in deep learning.Though GPUs have exhibited excellent performance on accelerating matrix multiplication,the real computing resources and memory resources of GPUs have not been fully utilized in the compute-intensive neural network training phase due to the complexity and diversity of network models.This paper focuses on doing an experimental and fine-grained performance analysis for deep neural network models.First,it divides the training phase into six stages in the sight of data flow and measures the latency of each stage.And then,it presents a quantitative analysis for GPU compute efficiency and resource utilization in each layer from point of views of GPU-accelerated libraries,neural network models,and batch size.Finally,weights and feature maps of each layer are given quantitatively to reveal the GPU memory utilization.These experiments and analysis show that(1)The compute efficiency of cuDNN in convolution layers is 2 times than cuBLAS.(2)The resource utilization of convolution layers is 50%higher than full-connected layers.(3)The GPU memory utilization in different layers are varied,and the overall utilization is not high,no more than 20%of the total memory space.

作者李景军张宸曹强 LI Jingjun;ZHANG Chen;CAO Qiang(Key Laboratory of Information Storage System,Ministry of Education of China.Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology,Wuhan 430074,China)

机构地区华中科技大学武汉光电国家研究中心

出处《计算机科学与探索》 CSCD 北大核心 2018年第10期1645-1657,共13页 Journal of Frontiers of Computer Science and Technology

关键词网络模型图形处理单元(GPU) 资源利用率计算效率数据流 GPU加速库 network models graphics processing unit(GPU) resource utilization compute efficiency data flow GPU-accelerated library

分类号 TP183 [自动化与计算机技术—控制理论与控制工程] TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

同被引文献1

1卢冶,陈瑶,李涛,蔡瑞初,宫晓利.面向边缘计算的嵌入式FPGA卷积神经网络构建方法[J].计算机研究与发展,2018,55(3):551-562. 被引量：47

引证文献1

1宋铁.基于卷积神经网络的GFW加速调度算法[J].软件,2019,40(3):217-221.

1小霸王Z＋新主机正式发布[J].微型计算机,2018,0(24):71-71.
2赖剑奇,李桦,张冉,常青.多GPU并行可压缩流求解器及其性能分析[J].航空学报,2018,39(9):21-30. 被引量：2
3黄庭培,郑秋梅,刘新平,李世宝.新工科背景下计算机组成原理课程教学改革探索[J].教育教学论坛,2018(44):98-99. 被引量：8
4袁珩.美智库探讨人工智能对国家实力的影响[J].科技中国,2018,0(10):94-96. 被引量：2
5秦茂源,慕德俊,胡伟,毛保磊.硬件安全门级细粒度形式化验证方法[J].西安电子科技大学学报,2018,45(5):143-148. 被引量：7
6赵莉,白猛猛,雷松泽,计雪薇.深度学习在车牌定位中的研究[J].计算机应用研究,2018,35(10):3142-3146. 被引量：9
7谭铁牛,孙哲南,张兆翔.人工智能:天使还是魔鬼?[J].中国科学：信息科学,2018,48(9):1257-1263. 被引量：30
8程德强,蔡迎春,陈亮亮,宋玉龙.边缘修正的多尺度卷积神经网络重建算法[J].激光与光电子学进展,2018,55(9):126-134. 被引量：4
9姚建强,郭晓东,李斌.基于云计算技术的信息系统机房监控技术发展趋势与应用[J].信息系统工程,2018,31(9):168-169. 被引量：1
10操玉杰,李纲,毛进,王晓.大数据环境下面向决策全流程的应急信息融合研究[J].图书情报知识,2018,35(5):95-104. 被引量：32

计算机科学与探索

2018年第10期

浏览历史

内容加载中请稍等...

面向训练阶段的神经网络性能分析被引量：1

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

面向训练阶段的神经网络性能分析 被引量：1

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

面向训练阶段的神经网络性能分析被引量：1