摘要
近年来,随机方差缩减类算法在求解机器学习中的大规模优化问题时得到了广泛应用.但是如何选择此类算法的合适步长依然是值得研究的问题.受启发于结合Barzilai-Borwein步长的随机方差缩减梯度(stochastic variance reduced gradient with Barzilai-Borwein step size,SVRG-BB)算法,本文针对方差缩减类算法提出基于局部Lipschitz常数估计的自适应步长,并通过构建一个极小极大化问题给出该步长应用于不同算法时的参数选取方法.然后将该步长与随机递归梯度算法(stochastic recursive gradient algorithm,SARAH)和随机方差缩减(stochastic variance reduced gradient,SVRG)算法相结合,分别提出结合自适应步长的随机递归梯度(SARAH with adaptive step size,SARAH-AS)方法和结合自适应步长的随机方差缩减梯度(SVRG with adaptive step size,SVRG-AS)算法,并且在强凸假设下证明以上算法点距离序列的线性收敛性质.此外,本文还提供一个新颖的视角揭示为什么SARAH+算法是有效的.在公开数据集上的数值实验结果表明本文提出的自适应步长在方差缩减类算法中表现良好.
Stochastic variance reduced methods have recently surged into prominence for solving large scale optimization problems in machine learning.However,how to choose the step sizes is still a problem to work out.Inspired by SVRG-BB(stochastic variance reduced gradient with Barzilai-Borwein step size),we propose an adaptive step size which is based on local estimation of Lipschitz constant for variance reduced methods.A framework was given of how to select the crucial parameter for different algorithms in our step size by solving a minimax problem.Then we adapt this step size to SARAH(stochastic recursive gradient algorithm)and SVRG(stochastic variance reduced gradient),which leads to two algorithms SARAH-AS(SARAH with adaptive step size)and SVRG-AS(SVRG with adaptive step size),respectively.Both of them converge linearly in the strongly convex case.Furthermore,we provide a novel perspective to explore why SARAH+performs well in practice.Numerical experiments on standard datasets demonstrate the efficiency of our adaptive step size for stochastic variance reduced methods.
作者
刘彦
郭田德
韩丛英
Yan Liu;Tiande Guo;Congying Han
出处
《中国科学:数学》
CSCD
北大核心
2021年第9期1433-1450,共18页
Scientia Sinica:Mathematica
基金
国家自然科学基金(批准号:11731013,11571014,11991022和U19B2040)资助项目。
关键词
随机方差缩减类算法
自适应步长
线性收敛率
stochastic variance reduced methods
adaptive step size
linear convergence rate