期刊文献+

一种组合型的深度学习模型学习率策略 被引量:27

A Combinatory Form Learning Rate Scheduling for Deep Learning Model
下载PDF
导出
摘要 一个设计良好的学习率策略可以显著提高深度学习模型的收敛速度,减少模型的训练时间.本文针对AdaGrad和AdaDec学习策略只对模型所有参数提供单一学习率方式的问题,根据模型参数的特点,提出了一种组合型学习策略:AdaMix.该策略为连接权重设计了一个仅与当前梯度有关的学习率,为偏置设计使用了幂指数型学习率.利用深度学习模型Autoencoder对图像数据库MNIST进行重构,以模型反向微调过程中测试阶段的重构误差作为评价指标,验证几种学习策略对模型收敛性的影响.实验结果表明,AdaMix比AdaGrad和AdaDec的重构误差小并且计算量也低,具有更快的收敛速度. A good learning rate scheduling can significantly improve the convergence rate of the deep learning model and reduce the training time. The AdaGrad and AdaDec learning strategies only provide a single form learning rate for all the parameters of the deep learning model. In this paper, AdaMix is proposed. According to the characteristics of the model parameters, and a learning rate form which is only based on the current epoch gradient is designed for the connection weights, a power exponential learning rate form is used for the bias. The test reconstruction error in the fine-turning phase of the deep learning model is used as the evaluation index. In order to verify the convergence of the deep learning based on different learning rate strategies, Autoencoder, a deep learning model, is trained to restructure the MNIST database. The experimental results show that Adamix has the lowest reconstruction error and minimum calculation compared with AdaGrad and AdaDec, so the deep learning model can quickly converge by using AdaMix.
出处 《自动化学报》 EI CSCD 北大核心 2016年第6期953-958,共6页 Acta Automatica Sinica
基金 国家自然科学基金(61271143)资助~~
关键词 深度学习 学习率 组合学习策略 图像重构 Deep learning learning rate combined learning scheduling image reconstruction
  • 相关文献

参考文献18

  • 1Hinton G. Where do features come from? Cognitive Science, 2014, 38(6): 1078-1101.
  • 2LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444.
  • 3Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533.
  • 4Schmidhuber J. Deep learning in neural networks: an overview. Neural Networks, 2015, 61(7553): 85-117.
  • 5高莹莹,朱维彬.深层神经网络中间层可见化建模[J].自动化学报,2015,41(9):1627-1637. 被引量:16
  • 6乔俊飞,潘广源,韩红桂.一种连续型深度信念网的设计与应用[J].自动化学报,2015,41(12):2138-2146. 被引量:21
  • 7Yu D, Deng L. Deep learning and its applications to signal and information processing. IEEE Signal Processing Maga- zine, 2011, 28(1): 145-154.
  • 8Hinton G E, Salakhutdinov R R. Reducing the dimensional- ity of data with neural networks. Science, 2006, 313(5786): 504-507.
  • 9Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 2011, 12:2121-2159.
  • 10Senior A, Heigold G, Ranzato M A, Yang K. An empirical study of learning rates in deep neural networks for speech recognition. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, BC: IEEE, 2013. 6724-6728.

二级参考文献56

  • 1Yoo H J. Deep convolution neural networks in computer vision: a review. IEIE Transactions on Smart Processing and Computing, 2015, 4(1): 35-43.
  • 2Oquab M, Bottou L, Laptev I, Sivic J. Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH: IEEE, 2014. 1717-1724.
  • 3Zhang C, Zhang Z Y. Improving multiview face detection with multi-task deep convolutional neural networks. In: Proceedings of the 2014 IEEE Winter Conference on Applications of Computer Vision (WACV). Steamboat Springs, CO: IEEE, 2014. 1036-1041.
  • 4Sainath T N, Kingsbury B, Saon G, Soltaua H, Mohamed A, Dahlb G, Ramabhadran R. Deep convolutional neural networks for large-scale speech tasks. Neural Networks, 2015, 64: 39-48.
  • 5Deng L, Hinton G, Kingsbury B. New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of the 2013 International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE, 2013. 8599-8603.
  • 6Bengio S, Heigold G. Word embeddings for speech recognition. In: Proceedings of the 15th Conference of the International Speech Communication Association, Interspeech. Singapore: ISCA, 2014. 1053-1057.
  • 7Le Q V, Mikolov T. Distributed representations of sentences and documents. In: Eprint Arxiv, 2014. 1188-1196.
  • 8Kiros R, Zemel R S, Salakhutdinov R. A multiplicative model for learning distributed text-based attribute representations. In: Eprint Arxiv, 2014. 2348-2356.
  • 9Lee C Y, Xie S N, Gallagher P, Zhang Z, Tu Z W. Deeply-supervised nets. In: Eprint Arvix, 2014. 562-570.
  • 10Weston J, Ratle F, Mobahi H, Collobert R. Deep learning via semi-supervised embedding. Neural Networks: Tricks of the Trade. Berlin Heidelberg: Springer, 2012. 639-655.

共引文献33

同被引文献209

引证文献27

二级引证文献144

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部