摘要
近年来,随着深度学习的不断发展,深度卷积神经网络在各个实际场景中得到广泛的应用。然而,由于训练好的模型参数的保存精度和推理精度通常是32位浮点型,其计算复杂度高、内存占用较大、耗时长等缺点,导致一些精度很高的模型往往很难部署到计算和内存资源受限的边缘设备上去。针对此,本文提出一种将浮点模型量化为定点模型的量化算法,此算法可以在模型训练过程中使用网络学习出特定的层应该采取的量化精度,避免了人为设定量化精度从而导致训练好的定点模型精度较浮点模型精度严重降低的情况。最后结合ASIC神经网络加速芯片进行边缘端部署,证明了算法的有效性。
In recent years, with the continuous development of deep learning, deep convolutional neural network has been widely used in various practical scenes. However, the storage accuracy and reasoning accuracy of trained model parameters are usually 32-bit floating-point type, which has the disadvantages of high computational complexity, large memory occupation and long time-consuming.As a result, some models with high accuracy are often difficult to deploy to edge devices with limited computing and memory resources.In view of this, this paper proposes a quantization algorithm that quantizes the floating-point model into a fixed-point model. This algorithm can use the network to learn the quantization accuracy that should be adopted by a specific layer in the process of model training,avoiding the situation that the quantization accuracy is artificially set, resulting in a serious reduction in the accuracy of the trained fixed-point model compared with the floating-point model. Finally, the edge deployment is combined with ASIC neural network acceleration chip to prove the effectiveness of the algorithm.
作者
王骞
陶青川
Wang Qian;Tao Qingchuan(School of Electronic Information,Sichuan University,Chengdu 610065)
出处
《现代计算机》
2021年第36期28-33,共6页
Modern Computer
关键词
参数量化
深度学习
加速芯片
parameter quantification
deep learning
acceleration chip