期刊文献+

基于张量虚拟机的快速卷积自动性能优化 被引量:1

Fast Convolution Automatic Performance Optimization Based on Tensor Virtual Machine
下载PDF
导出
摘要 卷积神经网络作为深度学习的典型代表,是计算机视觉等任务中最常用的神经网络,然而,卷积运算通常占整个卷积神经网络运行时的90%以上,成为卷积神经网络的性能瓶颈。此外,由于当下硬件的复杂性及工作负载的多样性,之前工作中的一些特定优化往往缺乏性能可移植性。对此,作者提出BlazerML,一个基于张量虚拟机(TVM)模板代码自动生成的开源卷积计算库,可为任何输入形状自动生成高性能的卷积实现。BlazerML是基于Winograd算法实现的,因为该算法是快速卷积算法中性能最高的算法。实验结果表明:BlazerML显著优于当下最先进的开源库。在x86 CPU上运行常见的深度学习网络前向推理分别比OnnxRuntime、MNN和TVM社区版本快1.18~2.47倍、1.18~2.27倍和1.01~1.66倍。在ARMCPU上运行常见深度学习网络的单层推理分别比ACL和FastConv快1.26~6.11倍、1.04~4.28倍。 Convolutional Neural Networks(CNNs)as a quintessential representation of deep learning,are the most commonly used neural networks in tasks such as computer vision.However,convolution operations typically account for over 90%of the runtime in CNNs,becoming a bottleneck for performance.Additionally,due to the complexity of current hardware and the diversity of workloads,specific optimizations in previous work often lack performance portability.To address this problem,the author introduces BlazerML,an open-source convolution computation library based on auto-generated code templates from TVM,capable of automatically generating high-performance convolution implementations for any input shape.BlazerML is implemented based on the Winograd algorithm,known for its high performance in fast convolution algorithms.Experimental results demonstrate that BlazerML significantly outperforms current state-of-the-art open-source libraries.On x86 CPUs,running common deep learning network forward inferences,it is faster by 1.18—2.47 times,1.18—2.27 times,and 1.01—1.66 times compared to OnnxRuntime,MNN,and the TVM community version,respectively.On ARM CPUs,for single-layer inference of common deep learning networks,it surpasses ACL and FastConv by 1.26—6.11 times and 1.04—4.28 times,respectively.
作者 陈疆 朱泓霖 孟金涛 魏彦杰 CHEN Jiang;ZHU Honglin;MENG Jintao;WEI Yanjie(Southern University of Science and Technology,Shenzhen 518055,China;Shenzhen Institute of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China;Shenzhen Tencent Computer System Co.Ltd.,Shenzhen 518063,China)
出处 《集成技术》 2024年第5期3-18,共16页 Journal of Integration Technology
基金 广东省重点领域研发计划资助项目(2021B0101310002) 国家自然科学基金项目(62272449) 深圳市基础研究项目(RCYX20200714114734194,KQTD20200820113106007,ZDSYS20220422103800001) 中国科学院青年创新促进会项目(Y2021101)。
关键词 深度学习 卷积神经网络 快速卷积算法 Winograd算法 TVM 自动性能优化 deep learning convolutional neural networks fast convolution algorithms Winograd algorithm TVM automatic performance optimization
  • 相关文献

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部