摘要
当前基于忆阻器的神经网络加速器存在的资源需求高、系统功耗大等问题,提出了一种包含剪枝及量化算法在内的神经网络模型压缩框架.根据忆阻器阵列紧密耦合的特点,设计了一种忆阻器阵列感知的规则化增量剪枝算法,在保证模型准确度的条件下实现了硬件资源的节省;针对忆阻器加速器系统中ADC单元和忆阻器阵列功耗占比过大等问题,设计了一种二的幂次量化算法以降低加速器系统中ADC的精度需求以及计算阵列中低阻值忆阻器器件个数,实现系统功耗的降低.实验结果表明:提出的神经网络模型压缩框架在忆阻器加速器部署网络时可取得17.2〜30.7倍的能效提升以及4.3〜9.3倍的加速比,模型的精度损失维持在1%左右.
The current ReRAM-based NN acceleratorshave many problems such as high hardwareresource demand and high power consumption.An energy-efficient modelcompression framework consisting of pruning and quantization algorithms is proposed.According to the tightly coupled crossbar structure and unstructured sparsity,a crossbar-aware incrementalstructured pruning algorithm is designedtoachievehigher sparsity and accuracy.A power of two quantizationmethod is designedto reduce ADC resolution requirements and the numberof low resistance states(LRS)ReRAM cells in crossbars to improvethe energy efficiency.Experimental results show thatthe proposed modelcompression framework can achieve 17.2-30.7x energy efficiencyand 4.3-9.3x speedup,compared with ReRAM-based acceleratorsfor dense NN with about 1%accuracy loss.
作者
沈林耀
王琴
蒋剑飞
景乃锋
SHEN Linyao;WANG Qin;JIANG Jianfei;JING Naifeng(School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,China)
出处
《微电子学与计算机》
2021年第8期20-27,共8页
Microelectronics & Computer
关键词
忆阻器加速器
神经网络
量化
剪枝
ReRAM-based Accelerator
Neural networks
Quantization
Pruning