摘要
一个设计良好的学习率策略可以显著提高深度学习模型的收敛速度,减少模型的训练时间.本文针对AdaGrad和AdaDec学习策略只对模型所有参数提供单一学习率方式的问题,根据模型参数的特点,提出了一种组合型学习策略:AdaMix.该策略为连接权重设计了一个仅与当前梯度有关的学习率,为偏置设计使用了幂指数型学习率.利用深度学习模型Autoencoder对图像数据库MNIST进行重构,以模型反向微调过程中测试阶段的重构误差作为评价指标,验证几种学习策略对模型收敛性的影响.实验结果表明,AdaMix比AdaGrad和AdaDec的重构误差小并且计算量也低,具有更快的收敛速度.
A good learning rate scheduling can significantly improve the convergence rate of the deep learning model and reduce the training time. The AdaGrad and AdaDec learning strategies only provide a single form learning rate for all the parameters of the deep learning model. In this paper, AdaMix is proposed. According to the characteristics of the model parameters, and a learning rate form which is only based on the current epoch gradient is designed for the connection weights, a power exponential learning rate form is used for the bias. The test reconstruction error in the fine-turning phase of the deep learning model is used as the evaluation index. In order to verify the convergence of the deep learning based on different learning rate strategies, Autoencoder, a deep learning model, is trained to restructure the MNIST database. The experimental results show that Adamix has the lowest reconstruction error and minimum calculation compared with AdaGrad and AdaDec, so the deep learning model can quickly converge by using AdaMix.
出处
《自动化学报》
EI
CSCD
北大核心
2016年第6期953-958,共6页
Acta Automatica Sinica
基金
国家自然科学基金(61271143)资助~~
关键词
深度学习
学习率
组合学习策略
图像重构
Deep learning
learning rate
combined learning scheduling
image reconstruction