期刊文献+

Adam revisited:a weighted past gradients perspective 被引量:2

原文传递
导出
摘要 Adaptive learning rate methods have been successfully applied in many fields,especially in training deep neural networks.Recent results have shown that adaptive methods with exponential increasing weights on squared past gradients(i.e.,ADAM,RMSPROP)may fail to converge to the optimal solution.Though many algorithms,such as AMSGRAD and ADAMNC,have been proposed to fix the non-convergence issues,achieving a data-dependent regret bound similar to or better than ADAGRAD is still a challenge to these methods.In this paper,we propose a novel adaptive method weighted adaptive algorithm(WADA)to tackle the non-convergence issues.Unlike AMSGRAD and ADAMNC,we consider using a milder growing weighting strategy on squared past gradient,in which weights grow linearly.Based on this idea,we propose weighted adaptive gradient method framework(WAGMF)and implement WADA algorithm on this framework.Moreover,we prove that WADA can achieve a weighted data-dependent regret bound,which could be better than the original regret bound of ADAGRAD when the gradients decrease rapidly.This bound may partially explain the good performance of ADAM in practice.Finally,extensive experiments demonstrate the effectiveness of WADA and its variants in comparison with several variants of ADAM on training convex problems and deep neural networks.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第5期61-76,共16页 中国计算机科学前沿(英文版)
基金 We thank the anonymous reviewers for their insightful comments and discussions.This research was partially supported by grants from the National Key Research and Development Program of China(2018YFB1004300) the National Natural Science Foundation of China(Grant Nos.61703386,61727809,and U1605251).
  • 相关文献

同被引文献9

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部