期刊文献+

基于可重构阵列的CNN数据量化方法

CNN data quantization method based on reconfigurable array
下载PDF
导出
摘要 针对卷积神经网络(CNN)模型中大量卷积操作,导致网络规模大幅增加,从而无法部署到嵌入式硬件平台,以及不同粒度数据与底层硬件结构不协调导致计算效率低的问题,基于项目组开发的可重构阵列处理器,面向支持多种位宽的运算单元,通过软硬件协同和可重构计算方法,采用KL(Kullback-Leibler)散度自定义量化阈值和随机取整进行截断处理的方式,寻找参数定长的最佳基点位置,设计支持多种计算粒度并行操作的指令及其卷积映射方案,并以此实现三种不同位宽的动态数据量化。实验结果表明,将权值与特征图分别量化到8 bit可以在准确率损失2%的情况下将模型压缩为原来的50%左右;将测试图像量化到三种位宽下进行硬件测试的加速比分别达到1.012、1.273和1.556,最高可缩短35.7%的执行时间和降低56.2%的访存次数,同时仅带来不足1%的相对误差,说明该方法可以在三种量化位宽下实现高效率的神经网络计算,进而达到硬件加速和模型压缩的目的。 Convolution operations lead to a significant increase in the network size,which makes CNN models difficult to deploy to the embedded hardware platform,and different granularity data is not coordinated with the underlying hardware structure,which leads to low computing efficiency.Based on the reconfigurable array with the computing units supporting multiple bit widths,through software hardware cooperation and reconfigurable computing methods,this paper defined the quantization threshold using KL divergence and random integer method,proposed a strategy for finding the best basis point,designed an instruction set and a parallel mapping scheme supporting multiple bit widths to realize three distinct bit widths in data quantization.The results show the quantization scheme with 8 bit weight and feature map can compress model parameter quantity to about 50%with 2%accuracy loss.The acceleration ratios of quantifying the test images to three different bit widths reach 1.012,1.273,and 1.556,respectively,which can shorten the execution time by up to 35.7%and reduce memory access times by 56.2%,while only bringing less than 1%relative error.This indicates that this method can achieve efficient neural network computation under three quantization bit widths,thereby implementing hardware acceleration and model compression.
作者 朱家扬 蒋林 李远成 宋佳 刘帅 Zhu Jiayang;Jiang Lin;Li Yuancheng;Song Jia;Liu Shuai(School of Communication&Information Engineering,Xi’an University of Science&Technology,Xi’an 710600,China;School of Computer Science&Technology,Xi’an University of Science&Technology,Xi’an 710600,China;School of Electrical&Control Engineering,Xi’an University of Science&Technology,Xi’an 710600,China)
出处 《计算机应用研究》 CSCD 北大核心 2024年第4期1070-1076,共7页 Application Research of Computers
基金 科技创新2030-“新一代人工智能”重大项目(2022ZD0119005) 国家自然科学基金重点资助项目(61834005)。
关键词 卷积神经网络 数据量化 可重构结构 并行映射 加速比 convolutional neural network(CNN) data quantization reconfigurable structure parallel mapping acceleration ratio
  • 相关文献

参考文献6

二级参考文献8

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部