期刊文献+

CONVERGENCE OF BACKPRIOPAG ATION WITH MOMENTUM FOR NETWORK A RCHITECTURES WITH SKIP CONNECTIONS

原文传递
导出
摘要 We study a class of deep neural networks with architectures that form a directed acyclic graph(DAG).For backpropagation defined by gradient descent with adaptive momentum,we show weights converge for a large class of nonlinear activation functions.'The proof generalizes the results of Wu et al.(2008)who showed convergence for a feed-forward network with one hidden layer.For an example of the effectiveness of DAG architectures,we describe an example of compression through an AutoEncoder,and compare against sequential feed-forward networks under several metrics.
出处 《Journal of Computational Mathematics》 SCIE CSCD 2021年第1期147-158,共12页 计算数学(英文)
  • 相关文献

参考文献1

二级参考文献15

  • 1D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning representations by back-propagating errors, Nature, 323 (1986), 533-536.
  • 2M. Torii and M.T. Hagan, Stability of steepest descent with momentum for quadratic functions, IEEE T. Neural Networ., 13:3 (2002), 752-756.
  • 3W. Wu, H.M. Shao and D. Qu, Strong convergence for gradient methods for BP networks training, Proceedings of 2005 International Conference on Neural Networks and Brains (ICNN-B'05), Edited by M.-S. Zhao and Z.-C. Shi, Beijing, China, 2005, IEEE Press. pp. 332-334.
  • 4Y.X. Yuan, W.Y. Sun, Optimization Theory and Methods, Science Press, Beijing, 2001.
  • 5N.M. Zhang, W. Wu, and G.F. Zheng, Convergence of gradient method with momentum for two-layer feedforward neural networks, IEEE T. Neural Networ., 17:2 (2006), 522-525.
  • 6N. Ampazis and S.J. Perantonis, Two highly efficient second-order algorithms for training feedfor- ward networds, IEEE T. Neural Networ., 13:5 (2002), 1064-1074.
  • 7A. Bhaya and E. Kaszkurewicz, Steepest descent with momentum for quadratic functlons is a version of the conjugate gradient method, Neural Networks, 17 (2004), 65-71.
  • 8S.E. Fahlman, Faster learning variations on back propogation: AN empirical study, in Proc. 1933, Connectionist Models Summer School, San Mateo, CA: Morgan Kaufmann, 38-51.
  • 9T.L. Fine and S. Mukherjee, Parameter convergence and learning curves for neural networks, Neural Comput., 11 (1999), 747-769.
  • 10W. Finnoff, Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to locol minima, Neural Comput., 6 (1994), 285-295.

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部