基于张量虚拟机的深度神经网络模型加速方法被引量：1

Deep neural network model acceleration method based on tensor virtual machine

下载PDF

导出

摘要随着人工智能(AI)技术的蓬勃发展,深度神经网络(DNN)模型被大规模应用到各类移动端与边缘端。然而,边缘端算力低、内存容量小,且实现模型加速需要深入掌握边缘端硬件知识,这增加了模型的部署难度,也限制了模型的推广应用。因此,基于张量虚拟机(TVM)提出一种DNN加速与部署方法,从而实现卷积神经网络(CNN)模型在现场可编程门阵列(FPGA)上的加速,并在分心驾驶分类应用场景下验证了所提方法的可行性。通过计算图优化方法减小了模型的访存和计算开销,通过模型量化方法减小了模型尺寸,通过计算图打包方法将卷积计算卸载到FPGA上执行以提高模型推理速度。与微处理器(MPU)相比,所提方法可使ResNet50和ResNet18在MPU+FPGA上的推理时间分别减少88.63%和77.53%;而在AUC(American University in Cairo)数据集上,相较于MPU,两个模型在MPU+FPGA上的top1推理精度仅下降了0.26和0.16个百分点。可见,所提方法可以降低不同模型在FPGA上的部署难度。 With the development of Artificial Intelligence(AI)technology,the Deep Neural Network(DNN)models have been applied to various mobile and edge devices widely.However,the model deployment becomes challenging and the popularization and application of the models are limited due to the facts that the computing power of edge devices is low,the memory capacity of edge devices is small,and the realization of model acceleration requires in-depth knowledge of edge device hardware.Therefore,a DNN acceleration and deployment method based on Tensor Virtual Machine(TVM)was presented to accelerate the Convolutional Neural Network(CNN)model on Field-Programmable Gate Array(FPGA),and the feasibility of this method was verified in the application scenarios of distracted driving classification.Specifically,in the proposed method,the computational graph optimization method was utilized to reduce the memory access and computational overhead of the model,the model quantization method was used to reduce the model size,and the computational graph packing method was adopted to offload the convolution calculation to the FPGA in order to speed up the model inference.Compared with MPU(MicroProcessor Unit),the proposed method can reduce the inference time of ResNet50 and ResNet18 on MPU+FPGA by 88.63%and 77.53%respectively.On AUC(American University in Cairo)dataset,compared to MPU,the top1 inference accuracies of the two models on MPU+FPGA are only reduced by 0.26 and 0.16 percentage points respectively.It can be seen that the proposed method can reduce the deployment difficulty of different models on FPGA.

作者申云飞申飞李芳张俊 SHEN Yunfei;SHEN Fei;LI Fang;ZHANG Jun(Institute of Physical Science and Information Technology,Anhui University,Hefei Anhui 230031,China;High Magnetic Field Laboratory,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei Anhui 230031,China;High Magnetic Field Laboratory of Anhui Province,Hefei Anhui 230031,China)

机构地区安徽大学物质科学与信息技术研究院中国科学院合肥物质科学研究院强磁场科学中心强磁场安徽省实验室

出处《计算机应用》 CSCD 北大核心 2023年第9期2836-2844,共9页 journal of Computer Applications

基金安徽省重点研究与开发计划项目(202004h07020031) 中国科学院合肥大科学中心重点研发项目(2019HSC-KPRD003)。

关键词张量虚拟机深度神经网络现场可编程门阵列边缘设备模型部署模型加速 Tensor Virtual Machine(TVM) Deep Neural Network(DNN) Field-Programmable Gate Array(FPGA) edge device model deployment model acceleration

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1焦李成,孙其功,杨育婷,冯雨歆,李秀芳.深度神经网络FPGA设计进展、实现与展望[J].计算机学报,2022,45(3):441-471. 被引量：15

二级参考文献2

1罗军,黄启俊,常胜,李昌盛.H.264中整数变换与量化的FPGA实现[J].中国图象图形学报,2011,16(5):740-745. 被引量：4
2Jian CHENG,Pei-song WANG,Gang LI,Qing-hao HU,Han-qing LU.Recent advances in efficient computation of deep convolutional neural networks[J].Frontiers of Information Technology & Electronic Engineering,2018,19(1):64-77. 被引量：37

共引文献14

1李小莉,薛启隆,苗坤宏,赵倩,于洋,李正.FPGA技术在中药智能制药中的应用探讨[J].中草药,2023,54(1):283-291. 被引量：3
2莫子乐.基于FPGA的中波广播发射台自动化播控系统设计[J].中国有线电视,2023(1):56-59. 被引量：6
3罗朋,樊涵宇,梁剑鑫,姜淏予,刘洺辛.基于神经网络拟合显式MPC的高增益直流变换器[J].电力系统保护与控制,2023,51(20):47-61. 被引量：1
4王羽展,郭斌,王虹力,刘思聪.智能物联网终端自适应模型量化方法[J].计算机科学,2023,50(11):306-316. 被引量：2
5邱臻博.一种基于FPGA的CNN硬件加速器实现[J].电子技术应用,2023,49(12):20-25.
6韩玉鑫,王晓凯.基于FPGA彩色图像自适应巴特沃斯滤波器及其应用[J].微电子学与计算机,2024,41(1):83-92.
7彭关弘烨,李璐.基于FPGA的图像识别技术研究与设计[J].信息与电脑,2023,35(22):148-150.
8马竞.基于FPGA的单边带短波通信系统[J].通信电源技术,2024,41(1):163-165.
9黄佳美,张伟彬,熊官送.基于深度卷积神经网络的汽车图像分类算法与加速研究[J].现代电子技术,2024,47(7):140-144. 被引量：3
10刘峥嵘.基于FPGA的深度强化学习硬件加速技术分析[J].集成电路应用,2024,41(2):22-25. 被引量：1

同被引文献3

1池昊宇,陈长波.基于机器学习的编译器自动调优综述[J].计算机科学,2022,49(1):241-251. 被引量：7
2刘功晗,李悦,王晓玲.面向航天异构平台的深度学习编译器加速技术优化[J].航天控制,2022,40(2):60-65. 被引量：2
3杨思驰,赵荣彩,韩林,王洪生.面向DCU的LDS访存向量化优化[J].计算机工程,2024,50(2):206-213. 被引量：2

引证文献1

1高伟,李帅龙,茆琳,王磊,李颖颖,韩林.一种基于TVM的算子生成加速策略[J].计算机工程,2024,50(8):353-362.

1王羲献,李虎,梁美玉,袁荫,吉涛.电子信息工程专业创新班《电子设计综合创新实践》创新教学方法的研究与实践[J].中文科技期刊数据库（全文版）教育科学,2022(3):203-206.
2尚绍法,蒋林,李远成,朱筠.异构平台下卷积神经网络推理模型自适应划分和调度方法[J].计算机应用,2023,43(9):2828-2835. 被引量：3
3路知音.浅析如何开展计算机硬件组装教学[J].中文科技期刊数据库（引文版）教育科学,2020(12):208-208.

计算机应用

2023年第9期

浏览历史

内容加载中请稍等...

基于张量虚拟机的深度神经网络模型加速方法被引量：1

参考文献1

二级参考文献2

共引文献14

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于张量虚拟机的深度神经网络模型加速方法 被引量：1

参考文献1

二级参考文献2

共引文献14

同被引文献3

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于张量虚拟机的深度神经网络模型加速方法被引量：1