基于零模正则的神经网络剪枝方法

Pruning Approach to Neural Networks Based on Zero-norm Regularization

下载PDF

导出

摘要本文提出一种有效的神经网络剪枝方法。该方法对神经网络训练模型引入零模正则项来促使模型权重稀疏,并通过删减取值为零的权重来压缩模型。对所提出的零模正则神经网络训练模型,文中通过建立其等价MPEC形式的全局精确罚得到其等价的局部Lipschitz代理,然后通过用交替方向乘子法求解该Lipschitz代理模型对网络进行训练、剪枝。最后,对MLP和LeNet-5网络模型进行测试,分别在误差2.2%和1%下,取得97.43%和99.50%的稀疏度,达到很好的剪枝效果。 Deep Neural Network(DNN)has become ubiquitous in our daily life ranging from autonomous driving to smart home.It has become an inevitable trend to introduce DNN model into mobile devices and embedded systems.The redundancy of parameters has always been the main reason for hindering neural network inference and making it difficult to deploy on mobile system.In recent years,academia and industry have proposed many methods for model compression,such as model compression,knowledge distillation,and network pruning.Neural network pruning,as an important means of network model compression,reduces network parameters by removing some neural connections,effectively overcoming the high computational cost and high memory resource proportion caused by neural network weight redundancy.Our method in this article is a further extension of the network pruning model and solving algorithm.In this work,we propose an effective pruning method for neural networks against the problem of high computational costs and considerable memory bandwidth caused by huge complexity and parameters redundancy of neural network model.This method improves the sparsity of model weights by introducing zero-norm regularized term into the neural network model,and compresses the model by deleting those zero weights.For the proposed zero-norm regularized neural network model,by establishing the global exact penalty for its equivalent MPEC form,we obtain an equivalent Lipschitz surrogate.Based on the equivalent local Lipschitz surrogate,considering that when the activation function is sigmod,the loss function of the final optimization model is a combination of smooth and non-smooth terms,and the smooth part can be solved through existing frameworks,while the non-smooth part has an exact expression,we design an proximal alternating direction multiplier method(P-ADMM)to solve the smooth loss model induced by sigmod activation function.Numerical experiments conducted for P-ADMM validate their efficiency.The tests for the MLP and LeNet-5 network respectively yield 97.43%and 99.50%sparsity without the loss of accuracy.The results of numerical experiment show that our method effectively reduces the complexity of the model,and has better sparse ratio compared with other pruning methods.Meanwhile,it has the advantages of convenient implementation and easy extension.This article proposes a(P-ADMM)method for solving the smooth loss network pruning model.For the highly non convexity of the neural network model,although the paper utilizes alternating solution and the computational graph framework to solve the model,the convergence speed of the algorithm is slow in the later stage.Therefore,one of the future research directions is whether to propose an acceleration strategy to improve the convergence rate of the algorithm,and whether to directly solve the non-convex and non-smooth model using gradient methods for backpropagation algorithms and computational graph frameworks.Another interesting research direction is how to design effective algorithms to find a solution when the smooth loss function is non smooth,and what convergence properties the algorithm possesses.

作者柳智 LIU Zhi(School of Mathematics,South China University of Technology,Guangzhou 510000,China)

机构地区华南理工大学数学学院

出处《运筹与管理》 CSCD 北大核心 2023年第10期102-107,共6页 Operations Research and Management Science

基金国家自然科学基金面上项目(11971177)。

关键词神经网络剪枝零模正则交替方向乘子法 network pruning zero-norm regularization ADMM

分类号 O224 [理学—运筹学与控制论] TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1杨涛,常怡然,张坤朋,徐磊.基于预设时间收敛的分布式优化算法[J].控制与决策,2023,38(8):2364-2374.
2张雷,童虎庆,谢锦昌,杨昆.基于ARM平台目标检测的轻量化方法[J].电子测量技术,2023,46(12):118-124.
3林虹燕,杜元花,周楠,田永强.基于多视角自适应图正则的非负矩阵分解聚类[J].成都信息工程大学学报,2023,38(5):526-534.
4周玉陶,张正华,朱尔立,金志琦,戚义盛,苏权.基于ADMM优化的停车位分配模型与求解[J].无线电工程,2023,53(12):2783-2790.
5李力,董密,宋冬然,杨建,王其兵.分布式的温控负荷集群负荷跟随控制[J].中国电机工程学报,2023,43(21):8270-8281. 被引量：5
6孙倩倩,沈荣鑫.关于Hausdorff拓扑粗糙群正则性的注记[J].高校应用数学学报（A辑）,2023,38(4):478-482.
7彭玉寒,李书琴.基于重参数化MobileNetV2的农作物叶片病害识别模型[J].农业工程学报,2023,39(17):132-140. 被引量：4
8陈星,李丹杨,何庆.自适应多目标遗传算法的集成剪枝用于人脸表情识别[J].电子科技,2023,36(12):55-63.
9陈金立,蒋志军,朱熙铖,李家强.基于矩阵因子重构的MIMO雷达角度估计方法[J].雷达科学与技术,2023,21(6):653-660. 被引量：1
10丁世飞,张成龙,郭丽丽,张健,丁玲.基于M-estimator函数的加权深度随机配置网络[J].计算机学报,2023,46(11):2476-2487.

运筹与管理

2023年第10期

浏览历史

内容加载中请稍等...

基于零模正则的神经网络剪枝方法

相关作者

相关机构

相关主题

浏览历史