期刊文献+

梯度有偏情形非光滑问题NAG的个体收敛性 被引量:2

Individual Convergence of NAG with Biased Gradient in Nonsmooth Cases
下载PDF
导出
摘要 随机优化方法已经成为处理大规模正则化和深度学习优化问题的首选方法,其收敛速率的获得通常都建立在目标函数梯度无偏估计的基础上,但对机器学习问题来说,很多现象都导致了梯度有偏情况的出现.与梯度无偏情形不同的是,著名的Nesterov加速算法NAG(Nesterov accelerated gradient)会逐步累积每次迭代中的梯度偏差,从而导致不能获得最优的收敛速率甚至收敛性都无法保证.近期的研究结果表明,NAG方法也是求解非光滑问题投影次梯度关于个体收敛的加速算法,但次梯度有偏对其影响的研究未见报道.针对非光滑优化问题,证明了在次梯度偏差有界的情况下,NAG能够获得稳定的个体收敛界,而当次梯度偏差按照一定速率衰减时,NAG仍然可获得最优的个体收敛速率.作为应用,得到了一种无需精确计算投影的投影次梯度方法,可以在保持收敛性的同时较快地达到稳定学习的精度.实验验证了理论分析的正确性及非精确方法的性能. Stochastic method has become the first choice for dealing with large-scale regularization and deep learning optimization problems.The acquisition of its convergence rate heavily depends on the unbiased gradient of objective functions.However,for machine learning problems,many scenarios can result in the appearance of biased gradient.In contrast to the unbiased gradient cases,the well-known Nesterov accelerated gradient(NAG)accumulates the error caused by the bias with the iteration.As a result,the optimal convergence will no longer hold and even the convergence cannot be guaranteed.Recent research shows that NAG is also an accelerated algorithm for the individual convergence of projection sub-gradient methods in non-smooth cases.However,until now,there is no report about the affect when the subgradient becomes biased.In this study,for non-smooth optimization problems,it is proved that NAG can obtain a stable individual convergence bound when the subgradient bias is bounded,and the optimal individual convergence can still be achieved while the subgradient errors decrease at an appropriate.As an application,an inexact projection subgradient method is obtained in which the projection needs not calculate accurately.The derived algorithm can approach the stable learning accuracy more quick while keeping the convergence.The experiments verify the correctness of theoretical analysis and the performance of inexact methods.
作者 刘宇翔 程禹嘉 陶卿 LIU Yu-Xiang;CHENG Yu-Jia;TAO Qing(Department of Information Engineering,PLA Army Academy of Artillery and Air Defense,Hefei 230031,China)
出处 《软件学报》 EI CSCD 北大核心 2020年第4期1051-1062,共12页 Journal of Software
基金 国家自然科学基金(61673394)。
关键词 机器学习 Nesterov加速方法 随机优化 梯度估计有偏 个体收敛 machine learning Nesterov accelerated gradient stochastic optimization biased gradient individual convergence
  • 相关文献

参考文献4

二级参考文献47

  • 1Vapnik VN. Statistical Learning Theory. New York: Wiley-Interscience, 1998.
  • 2Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 2004,32(l):56-85. [doi: 10.1214/aos/1079120130].
  • 3Zhang T. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 2004,5:1225-1251.
  • 4Wang J, Tao Q. Machine learning: The state of the art. IEEE Intelligent Systems, 2008,23(6):49-55. [doi: 10.1109/MIS.2008.107].
  • 5Bennett KP, Parrado-Hemandez E. The interplay of optimization and machine learning research. Journal of Machine Learning Research, 2006,7:1265-1281.
  • 6Tibshirani R. Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society (Series B), 1996,58(l):267-288.
  • 7Nesterov Y. Primal-Dual subgradient methods for convex problems. Mathematical Programming, 2009,120(l):221-259. [doi: 10. 1007/sl0107-007-0149-x].
  • 8Bertsekas DP, Nedic A, Ozdaglar AE. Convex Analysis and Optimization. Belmont: Athena Scientific, 2003.
  • 9Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent. In: Proc. of the Int’l Conf. on Machine Learning. 2003. 928-936.
  • 10Shalev-Shwartz S, Singer Y, Srebro N. Pegasos: Primal estimated sub-gradient solver for SVM. In: Proc. of the Int’l Conf. on Machine Learning. 2007. 807-814. [doi: 10.1145/1273496.1273598].

共引文献38

同被引文献4

引证文献2

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部