基于梯度下降的循环批量权重平均优化算法

Recurrent Batch Weight Averaging Optimization Algorithm Based on Gradient Descent

下载PDF

导出

摘要神经网络模型的权重对模型的性能具有关键影响,权重的更新方式主要通过梯度下降算法实现。随机权重平均算法改进了权重更新方法,它通过平均随机梯度下降过程中得到的多个权重样本提高模型的泛化能力。然后,该算法并没有对平均后的模型进一步训练,也没有参与并影响模型的训练过程。本文针对该算法的局限性,结合已有的循环随机权重平均算法,提出循环批量权重平均算法。与循环随机权重平均算法在训练周期之间平均不同,本文所提算法在每个训练周期内的各个批次之间进行权重平均,并将平均后的权重作为下一个批次训练的初始权重,循环融合权重的平均过程和更新过程。本文将提出的算法与梯度下降和随机权重平均算法在CIFAR-10和CIFAR-100数据集上分别对VGG-16和ResNet-18模型进行多次仿真实验,并对实验结果进行比较。实验结果表明,循环批量权重平均算法显著加快模型的收敛速度,提升模型的训练效率,并能提高模型在测试集上的准确率。 The weights of the neural network model have a key impact on the performance of the model, and the update method of the weights is mainly realized by the gradient descent algorithm. The stochastic weight averaging algorithm improves the weight updating method, which improves the generalization ability of the model by averaging multiple weight samples obtained in the process of stochastic gradient descent. Then, the algorithm does not further train the averaged model, nor does it affect the training process of the model. Aiming at the limitations of this algorithm, this paper proposes a recurrent batch weight averaging algorithm based on the existing recurrent random weight averaging algorithm. Different from the recurrent random weight averaging algorithm which averages between training epoch, the proposed algorithm averages the weights between batch in each training epoch, uses the averaged weights as the initial weights of the next batch of training, and circulates the average process and updates process of the weights. In this paper, the proposed algorithm is compared with gradient descent and stochastic weight averaging algorithm on the CIFAR-10 and CIFAR-100 datasets for multiple simulation experiments on the VGG-16 and ResNet-18 models, respectively, and the experimental results are compared. The experimental results show that the cyclic batch weight averaging algorithm can significantly accelerate the convergence speed of the model, improve the training efficiency of the model, and improve the accuracy of the model on the test set.

作者赵子锐李晨龙杨卫华

机构地区太原理工大学数学学院

出处《应用数学进展》 2024年第6期2675-2686,共12页 Advances in Applied Mathematics

关键词深度神经网络梯度下降循环批量权重平均

分类号 TP3 [自动化与计算机技术—计算机科学与技术]