期刊文献+

一种三参数统一化动量方法及其最优收敛速率 被引量:1

A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate
下载PDF
导出
摘要 动量方法由于能够改善SGD(stochastic gradient descent)的收敛性能而倍受机器学习研究者的关注.随着其在深度学习的成功应用,动量方法出现了众多形式的变体.特别地,产生了SUM(stochastic unified momentum)和QHM(quasi-hyperbolic momentum)两种统一框架.但是,即使是对非光滑凸优化问题,其最优平均收敛性的获得仍然存在着固定迭代步数和无约束等不合理限制.为此,提出了一种更一般的含三参数的统一化动量方法TPUM(triple-parameters unified momentum),能够同时包含SUM和QHM;其次,针对约束的非光滑凸优化问题,在采取时变步长的条件下,证明了所提出的TPUM具有最优的平均收敛速率,并将其推广到随机情况,从而保证了添加动量不会影响标准梯度下降法的收敛性能以及动量方法对机器学习问题的可应用性.典型的L1范数约束hinge损失函数优化问题实验验证了理论分析的正确性. Momentum methods have been receiving much attention in machine learning community due to being able to improve the performance of SGD. With the successful application in deep learning, various kinds of formulations for momentum methods have been presented. In particular, two unified frameworks SUM(stochastic unified momentum) and QHM(quasi-hyperbolic momentum) were proposed. Unfortunately, even for nonsmooth convex problems, there still exist several unreasonable limitations such as assuming the performed number of iterations to be predefined and restricting the optimization problems to be unconstrained in deriving the optimal average convergence. In this paper, we present a more general framework for momentum methods with three parameters named TPUM(triple-parameters unified momentum), which includes SUM and QHM as specific examples. Then for constrained nonsmooth convex optimization problems, under the circumstances of using time-varying step size, we prove that TPUM has optimal average convergence. This indicates that adding the momentum will not affect the convergence of SGD and it provides a theoretical guarantee for applicability of momentum methods in machine learning problems. The experiments on L1-ball constrained hinge loss problems verify the correctness of theoretical analysis.
作者 丁成诚 陶蔚 陶卿 Ding Chengcheng;Tao Wei;Tao Qing(Department of Information Engineering,Army Academy of Artillery and Air Defense of PLA,Hefei 230031;College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007)
出处 《计算机研究与发展》 EI CSCD 北大核心 2020年第8期1571-1580,共10页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61673394) 安徽省自然科学基金项目(1908085MF193)。
关键词 机器学习 优化算法 非光滑条件 动量方法 平均收敛速率 machine learning optimization algorithm non-smooth condition momentum methods average convergence rate
  • 相关文献

参考文献2

二级参考文献3

共引文献16

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部