摘要
针对在大样本数据集下,梯度下降法长期性存在着容易收敛到局部最优和收敛速度慢等问题,通过改变网络结构和梯度下降过程,提出了一种动态衰减网络和动态衰减梯度下降算法。在现有网络的基础上,层与层的每个神经元之间增加一条衰减权重,同时在梯度下降过程中引入了衰减权重项。衰减权重值随着迭代不断衰减,最终趋于0。由于衰减权重项的增加,可以在梯度下降的前期加快梯度下降速度和收敛速度,同时可以避免越过最优解和在最优解附近振荡,提高了网络获得最优解的概率。通过MNIST、CIFAR-10和CIFAR-100数据集的实验结果证实,所提出的动态衰减网络和算法,相比原始网络使用Adam和动量随机梯度下降法,测试准确度分别提高了0.2%~1.89%和0.75%~2.34%,同时具有更快的收敛速度。
To address the problems that the gradient descent method is easy to converge to the local optimum and the convergence speed is slow under large sample data sets,a dynamic attenuation network and a dynamic attenuation gradient descent algorithm are proposed by changing the network structure and gradient descent process in the paper.On the basis of the existing network,an attenuation weight is added between each neuron of each two layers,while an attenuation weight term is introduced in the gradient descent process.The attenuation weight value decreases continuously with iteration,and eventually converges to 0.Due to the addition of the attenuation weight term,the gradient descent speed and convergence speed can be accelerated in the early stage of gradient descent.At the same time,it can avoid crossing over the optimal solution and oscillating around the optimal solution.At the last,it can also improve the probability of the network to obtain the optimal solution.The experimental results on MNIST,CIFAR-10 and CIFAR-100 datasets show that the proposed dynamic attenuation network and dynamic attenuation gradient descent algorithm,compared with the original network that used Adam optimizer and stochastic gradient descent with momentum,improve the test accuracy by 0.2%~1.89%and 0.75%~2.34%,respectively,while having a faster convergence speed.
作者
费春国
刘启轩
Fei Chunguo;Liu Qixuan(College of Electronic Information and Automation,Civil Aviation University of China,Tianjin 300300,China)
出处
《电子测量与仪器学报》
CSCD
北大核心
2022年第7期230-238,共9页
Journal of Electronic Measurement and Instrumentation
关键词
深度学习
反向传播算法
局部最优
优化算法
deep learning
back propagation
local optimal
optimization algorithms