摘要
对偶平均(dual averaging)方法是一种颇具潜力的优化算法,它巧妙地利用过往所有梯度的信息,克服了传统一阶梯度算法无法摆脱的梯度消失的弊端,并得到稳定的收敛速率。而恰恰类似的是,动量方法同样是利用过往的梯度信息,目的在于在非凸优化问题中能够有效地逃离局部最小点和鞍点,而今年来动量方法也广泛活跃在凸优化领域,不单对一般的梯度下降算法起到加速作用,同时在没有光滑性条件的情况下,得到最优个体收敛速率。论文对对偶平均方法和动量方法的研究现状和存在的问题进行综述,分析两者的联系和区别,并在此基础上指出一些值得研究的问题。
The dual averaging method is a potential optimization algorithm,which skillfully uses the information of all gradients in the past,overcomes the drawback of the traditional one-step algorithm that cannot get rid of the gradient disappearance,and obtains a stable convergence rate.And similarly,the momentum method is also using the gradient information of the past,the purpose is that in a convex optimization problem effectively escape from local minima and saddle points,now years momentum method also widely active in the field of convex optimization,not only for the average gradient descent algorithm to accelerate the effect,in the absence of smoothness conditions at the same time,to get the optimal convergence rate of the individual.In this paper,the research status and existing problems of the dual mean method and momentum method are summarized,and the relation and difference between them are analyzed.
作者
曲军谊
QU Junyi(Department of Information Engineering,Army Academy of Artillery and Air Defense of PLA,Hefei 230031)
出处
《计算机与数字工程》
2022年第11期2443-2448,共6页
Computer & Digital Engineering
关键词
机器学习
对偶平均
动量方法
个体收敛性
稀疏性
machine learning
dual averaging
momentum method
individual convergence rate
sparsity