摘要
近年来,人工神经网络的研究取得了巨大成就,在图像识别、自然语言处理等领域均有突破性的成果,同时产生了众多商业应用,方便了我们的生活,比如语音助手、辅助驾驶等.由于神经网络算法属于计算密集型和访存密集型的负载,传统CPU处理器已不能满足其大规模商业化应用的需求,因此学术界和产业界试图在GPU、FPGA和ASIC上寻求突破.其中,神经网络加速器作为一种ASIC,它提供了高性能、低功耗的硬件解决方案,相关研究也越来越多.神经网络加速器作为一种协处理器,在其计算前后需要将数据在主机与设备之间进行搬运.特别是对吞吐量要求较高的神经网络前向推理任务,需要将网络模型参数、硬件指令等常量数据和输入、输出等变量数据,分别从主机内存拷入设备内存.如果常量数据在每一份输入数据计算前都拷贝一次,就存在常量数据重复拷贝的问题,浪费了时间与存储资源.如何在神经网络开发工具软件中实现拷贝多次变量数据但只拷贝一次常量数据,如何保证指令在每次计算中都正确寻址常量和变量,如何简化用户编程,提供用户友好的接口,就成为一系列值得研究的问题.在本文中,我们提出了一种基于常变量异步拷贝的神经网络开发工具软件及其编程模型QingLong来解决上述问题.QingLong编程模型包含三个阶段:定义网络、编译网络和计算.在定义网络阶段,用户可以为神经网络的数据节点绑定常量数据;在编译网络阶段,通过REOFF数据包装法将常量数据封装为数据包;在计算网络阶段,用户拷贝一次数据包后即可多次拷入输入数据并计算输出结果.该编程模型具有编译、计算分离,常变量异步拷贝,计算和数据拷贝可切分为三级流水线等优势.实验表明,在连续计算100份输入样本时,QingLong比DLPlib有平均17.48倍的性能提升,且输入样本越多,性能提升的倍数越大.
In recent years,the research of artificial neural network has made great achievements in image recognition,natural language processing and other fields.At the same time,it has produced many commercial applications,which is convenient for our life,such as voice assistant,assisted driving and so on.Because the neural network algorithm belongs to the computing intensive and memory intensive application,the traditional CPU processor is not suitable for large-scale commercial applications,so the academia and industry try to seek a breakthrough in GPU,FPGA and ASIC.Neural network accelerator is a kind of ASIC.It provides high-performance,low-power hardware solutions,which has many related research.As a kind of coprocessor,neural network accelerator needs to copy data between the host memory and device memory before and after its calculation.Especially for the neural network inference task with high throughput requirements,constant data such as network model parameters,hardware instructions and variable data such as input and output are copied into device memory from host memory.If constant data is copied once before each input data calculation,there is a problem of repeated copying of constant data,which wastes time and storage resources.There are a series of problems worth studying.How to copy multiple variable data but only one constant data in the neural network development tool software?How to ensure that the instructions address constants and variables correctly in each calculation?How to simplify user programming and provide user-friendly interface?In this paper,we propose neural network development tool based on asynchronous copy of constant and variable and its programming model QingLong to solve the above problems.QingLong programming model consists of three stages:network definition,compilation and computation.In network definition stage,users can bind constant data for data nodes of neural network.In network compilation stage,constant data is packaged into data package by REOFF method.In network computation stage,users can copy input data and calculate output results many times after one data package is copied.The programming model has the advantages of compiling and computing separation,asynchronous copy of constant and variable,calculation and data copy can be cut into three stage pipelines.The experiments show that QingLong has an average performance improvement of 17.48x over DLPlib when calculating 100 input samples continuously.And the more input samples,the greater the performance improvement.
作者
杜伟健
陈云霁
支天
吴林阳
陈小兵
庄毅敏
DU Wei-Jian;CHEN Yun-Ji;ZHI Tian;WU Lin-Yang;CHEN Xiao-Bing;ZHUANG Yi-Min(SKL of Computer Architecture,Institute of Computing Technology,CAS,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049;Cambricon Technologies,Shanghai 201308;Institute of Brain-Intelligence Technology,Zhangjiang Laboratory,Shanghai 201308;Shanghai Research Center for Brian Science and Brain-Inspired Intelligence,Shanghai 201308;CAS Center for Excellence in Brain Science and Intelligence Technology,Shanghai 201308;Cambricon Technologies,Beijing 100190)
出处
《计算机学报》
EI
CSCD
北大核心
2020年第4期587-599,共13页
Chinese Journal of Computers
基金
国家重点研发计划(2017YFA0700900,2017YFA0700902,2017YFA0700901,2017YFB1003101,2018AAA0103300)
国家自然科学基金(61432016,61532016,61672491,61602441,61602446,61732002,61702478,61732007,61732020)
北京市自然科学基金(JQ18013)“核心电子器件、高端通用芯片及基础软件产品”科技重大专项(2018ZX01031102)
中国科学院科技成果转移转化重点专项(KFJ-HGZX-013)
中国科学院前沿科学重点研究项目(QYZDBSSW-JSC001)
中国科学院战略性先导科技专项(XDB32050200,XDC01020000)
中科院标准化研究项目(BZ201800001)
北京智源人工智能研究院以及北京市科技新星计划项目(Z191100001119093)的支持.
关键词
神经网络
编程模型
常量和变量
异步拷贝
软件开发工具
neural network
programming model
constant and variable
asynchronous copy
software development kit