摘要
卷积神经网络作为深度学习的典型代表,是计算机视觉等任务中最常用的神经网络,然而,卷积运算通常占整个卷积神经网络运行时的90%以上,成为卷积神经网络的性能瓶颈。此外,由于当下硬件的复杂性及工作负载的多样性,之前工作中的一些特定优化往往缺乏性能可移植性。对此,作者提出BlazerML,一个基于张量虚拟机(TVM)模板代码自动生成的开源卷积计算库,可为任何输入形状自动生成高性能的卷积实现。BlazerML是基于Winograd算法实现的,因为该算法是快速卷积算法中性能最高的算法。实验结果表明:BlazerML显著优于当下最先进的开源库。在x86 CPU上运行常见的深度学习网络前向推理分别比OnnxRuntime、MNN和TVM社区版本快1.18~2.47倍、1.18~2.27倍和1.01~1.66倍。在ARMCPU上运行常见深度学习网络的单层推理分别比ACL和FastConv快1.26~6.11倍、1.04~4.28倍。
Convolutional Neural Networks(CNNs)as a quintessential representation of deep learning,are the most commonly used neural networks in tasks such as computer vision.However,convolution operations typically account for over 90%of the runtime in CNNs,becoming a bottleneck for performance.Additionally,due to the complexity of current hardware and the diversity of workloads,specific optimizations in previous work often lack performance portability.To address this problem,the author introduces BlazerML,an open-source convolution computation library based on auto-generated code templates from TVM,capable of automatically generating high-performance convolution implementations for any input shape.BlazerML is implemented based on the Winograd algorithm,known for its high performance in fast convolution algorithms.Experimental results demonstrate that BlazerML significantly outperforms current state-of-the-art open-source libraries.On x86 CPUs,running common deep learning network forward inferences,it is faster by 1.18—2.47 times,1.18—2.27 times,and 1.01—1.66 times compared to OnnxRuntime,MNN,and the TVM community version,respectively.On ARM CPUs,for single-layer inference of common deep learning networks,it surpasses ACL and FastConv by 1.26—6.11 times and 1.04—4.28 times,respectively.
作者
陈疆
朱泓霖
孟金涛
魏彦杰
CHEN Jiang;ZHU Honglin;MENG Jintao;WEI Yanjie(Southern University of Science and Technology,Shenzhen 518055,China;Shenzhen Institute of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China;Shenzhen Tencent Computer System Co.Ltd.,Shenzhen 518063,China)
出处
《集成技术》
2024年第5期3-18,共16页
Journal of Integration Technology
基金
广东省重点领域研发计划资助项目(2021B0101310002)
国家自然科学基金项目(62272449)
深圳市基础研究项目(RCYX20200714114734194,KQTD20200820113106007,ZDSYS20220422103800001)
中国科学院青年创新促进会项目(Y2021101)。