摘要
当前卷积神经网络模型存在规模过大且运算复杂的问题,难以应用部署在资源受限的计算平台.针对此问题,本文基于数据标准差提出了一种适合部署在现场可编程门阵列(Field Programmable Gate Array, FPGA)上的对数量化方法 .首先,依据FPGA的特性提出对数量化方法,将32 bit浮点乘法运算转换为整数乘法及移位运算,提高了运算效率.然后通过研究数据分布特点,提出基于数据标准差的输入量化及权值混合bit量化方法,能够有效减少量化损失.通过对RepVGG、EfficientNet等网络进行效率与精度对比实验,8 bit量化使得大型神经网络精度仅下降1%左右;输入量化为8 bit,权重量化为10 bit场景下,模型精度损失小于0.2%,达到浮点模型几乎相同的准确率.实验表明,所提量化方法能够使得模型大小减少75%左右,在基本保持原有模型准确率的同时有效地降低功耗损失、提高运算效率.
Due to the large scale of the current convolutional neural network model and complex calculations,it is not suitable for deployment on resource-constrained computing platforms.In order to solve this problem,this paper propos⁃es a logarithmic quantization method based on data standard deviation,which is suitable for deployment on FPGA(Field Programmable Gate Array).According to the characteristics of FPGA,this paper proposes a logarithmic quantization meth⁃od to convert the 32 bit floating point multiplication operation into integer multiplication and shift operation,which im⁃proves the efficiency of the operation.By studying the characteristics of data distribution,the input quantization and mixed bit weight quantization methods based on data standard deviation are proposed,which can effectively reduce the quantiza⁃tion loss.The experimental results show that the accuracy of large-scale neural network is only reduced by about 1%due to 8-bit quantization.When the input is quantized to 8 bits and the weight is quantized to 10 bits,the accuracy loss of the mod⁃el is less than 0.2%,which is almost the same as that of the floating-point model.Experimental results show that the pro⁃posed method can reduce the size of the model by about 75%,and effectively reduce the power loss and improve the com⁃puting efficiency while maintaining the accuracy of the original model.
作者
黄赟
张帆
郭威
陈立
羊光
HUANG Yun;ZHANG Fan;GUO Wei;CHEN Li;YANG Guang(Information Engineering University,Zhengzhou,Hennan 450001,China;National Digital Switching System Engineering Technology Research Center,Zhengzhou,Hennan 450002,China;Henan Administration of Radio and Television Monitoring Center,Zhengzhou,Hennan 450002,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2023年第3期639-647,共9页
Acta Electronica Sinica
基金
国家自然科学基金创新研究群体项目(No.61521003)。
关键词
卷积神经网络
现场可编程门阵列
对数量化
数据标准差
混合bit
convolutional neural networks
field programmable gate array(FPGA)
logarithmic quantization
stan⁃dard deviation of the data
mixed bit number