期刊文献+

基于专用卷积神经网络加速器的编译器设计与实现 被引量:1

Design and implementation of compiler based on special convolutional neural network accelerator
下载PDF
导出
摘要 不同框架深度学习模型部署是人工智能落地的核心,然而模型计算量和参数量过大、编程模型未统一导致了各种新型的专用卷积神经网络(CNN)加速器层出不穷,增加了模型的部署难度。对模型压缩和编译工具链这两个方面进行了改进:在模型压缩方面,提出新的通道剪枝标准,结合了通道的相关性和影响性以及输出通道对应的激活值,在保证精度的同时可以极大地削减卷积神经网络的计算量和参数量;在编译工具链方面,设计了一套自动的端到端优化堆栈,提出了针对基于现场可编程门阵列(FPGA)的深度学习编译器设计方法,并在中间表示中添加了所提出的排序标准的剪枝算法。实验结果表明,所设计的编译器于舰船目标检测的任务中,在通用设备上,保证精度损失不超过1%的情况下取得了1.3倍的加速效果;在专用的CNN加速器上取得了1.6倍的加速效果,在部署中能够有效地针对卷积网络进行加速。 The deployment of deep learning models in different frameworks is deemed as the core of the implementation of artificial intelligence algorithms.However,various new-type special Convolutional Neural Network(CNN)accelerators emerge in endlessly caused by the oversize model calculation and parameter quantity and the inconsistent programming model,which has increased the difficulty of model deployment.The improvements has been done from two aspects:model compression and compilation tool chain.In terms of model compression,a new channel pruning standard was proposed,the correlation and influence of the channel were combined,and the activation value corresponding to the output channel was taken into account.It could greatly reduce the calculation and parameter amounts of convolutional neural network while ensuring the accuracy.In terms of compilation tool chain,a set of automatic end-to-end optimization stack was designed,a design method of deep learning complier based on Field Programmable Gate Array(FPGA)was proposed.Besides,the pruning algorithm with proposed sort standard was added to the intermediate representation.The experimental results show that in the task of ship target detection on general equipment,the designed compiler can achieve 1.3 times the acceleration effect while ensuring an accuracy loss of less than 1%.It can achieve 1.6 times the acceleration effect on the special CNN accelerator.In general,it can effectively accelerate the convolutional neural network in deployment.
作者 焦禹铭 吴凯 郭风祥 王昭 宋庆增 JIAO Yuming;WU Kai;GUO Fengxiang;WANG Zhao;SONG Qingzeng(School of Computer Science and Technology,Tiangong University,Tianjin 300387,China;School of Electrical Engineering,Tiangong University,Tianjin300387,China;Information Science Academy,China Electronics Technology Group Corporation,Beijing 100086,China)
出处 《计算机应用》 CSCD 北大核心 2022年第S01期208-214,共7页 journal of Computer Applications
关键词 现场可编程门阵列 模型压缩 深度学习编译器 中间表示 目标检测 Field Programmable Gate Array(FPGA) model compression deep learning complier intermediate representation object detection
  • 相关文献

参考文献2

二级参考文献13

共引文献13

同被引文献8

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部