摘要
图像检测、识别任务已经被应用在越来越多的生产生活场景中,基于卷积神经网络的方法凭借着精度高的特点被广泛应用.但是卷积神经网络存在着权重参数多、对算力要求高的问题,算力有限且型号多样的边缘计算设备使得这些应用在使用中受限.在跨平台上运行高性能代码,以及基于GPU的卷积神经网络优化愈发重要.针对卷积神经网络中的卷积规模和其他通用矩阵乘(general matrix multiplication,GEMM)方法的不足,根据分块规模、分支执行、访存和计算比例,提出了一种针对卷积神经网络规模优化的GEMM优化方法,将其应用于Winograd算法,并结合算子合并,实现对卷积进一步优化.同时基于遍历的自调优选择性能最优的卷积算子,结合离线编译、内存池、16 b量化、网络规模裁剪等方法,来提升卷积神经网络的性能.最后在AMD V1605B平台上进行实验验证算法的效果,通过和其他GEMM算法以及深度学习网络的性能进行对比,验证了该方法能够获得比GEMM算法和Winograd算法更好的加速效果,并能有效地加速卷积神经网络.
Image detection and recognition tasks have been applied in more and more production and life scenarios.The convolution-based neural network method is widely used because of its high accuracy.However,the convolution neural network has the problems of many weight parameters and high computational requirements,which are limited by the limited computational power and the variety of edge computing devices.Running high-performance codes across platforms,convolutional neural network optimization based on GPU is increasingly important.In view of the insufficiency of convolution scale and other GEMM methods in convolutional neural network,we present a GEMM optimization method for convolutional neural network size optimization based on block size,branch execution,memory access and calculation scale,which can be applied to Wingrad algorithm and operator combination to further optimize convolution.At the same time,the convolution operator with the best performance is selected based on traversal self-tuning,combining offline compilation,memory pool,16 b quantization,network scale clipping,etc.to improve the performance of convolutional neural network.Finally,experiments are carried out on AMD V1605 B platform to verify the effectiveness of the algorithm.By comparing with other GEMM algorithms and deep learning networks,it is verified that this method can achieve better acceleration than GEMM and Winograd algorithms,and can effectively accelerate the convolutional neural network.
作者
李茂文
曲国远
魏大洲
贾海鹏
Li Maowen;Qu Guoyuan;Wei Dazhou;Jia Haipeng(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;Chinese Aeronautical Radio Electronics Research Institute,Shanghai 200241)
出处
《计算机研究与发展》
EI
CSCD
北大核心
2022年第6期1181-1191,共11页
Journal of Computer Research and Development
基金
国家重点研发计划项目(2107YFB0202105,2016YFB0200803,2017YFB0202302)
国家自然科学基金项目(61972376)
北京市自然科学基金项目(L182053)。
关键词
通用矩阵乘
Winograd算法
卷积神经网络
性能优化
GPU
general matrix multiplication(GEMM)
Winograd algorithm
convolutional neural network
performance optimization
GPU