期刊文献+

基于随机方差减小方法的DDPG算法 被引量:2

Deep Deterministic Policy Gradient Algorithm Based on Stochastic Variance Reduction Method
下载PDF
导出
摘要 针对深度确定性策略梯度算法(DDPG)收敛速度比较慢,训练不稳定,方差过大,样本应用效率低的问题,提出了一种基于随机方差减小梯度方法的深度确定性策略梯度算法(SVR-DDPG)。该算法通过利用随机方差减小梯度技术(SVRG)提出一种新的创新优化策略,将之运用到DDPG算法之中,在DDPG算法的参数更新过程中,加入了随机方差减小梯度技术,利用该方法的更新方式,使得估计的梯度方差有一个不断减小的上界,令方差不断缩小,从而在小的随机训练子集的基础上找到更加精确的梯度方向,以此来解决了由近似梯度估计误差引发的问题,加快了算法的收敛速度。将SVR-DDPG算法以及DDPG算法应用于Pendulum和Mountain Car问题,实验结果表明,SVR-DDPG算法具有比原算法更快的收敛速度,更好的稳定性,以此证明了算法的有效性。 Aiming at the problem that the Deep Deterministic Policy Gradient(DDPG)algorithm has slow convergence speed,training instability,large variance and poor sample efficiency.This paper proposes a deep deterministic policy gradient algorithm by utilizing Stochastic Variance Reduced Gradient(SVRG)algorithm.By utilizing stochastic variance reduced techniques,it proposes an innovative optimization strategy,applying it to DDPG algorithm.In the parameter update process of the DDPG algorithm,by using the update mode of this method,the estimated gradient variance has a decreasing upper bound,so that the variance decreases continuously,so as to find a more accurate gradient direction on the basis of a small random training subset.This strategy solves the problem caused by the approximate gradient error,speeds up the convergence speed of the algorithm.Applying SVR-DDPG algorithm and DDPG algorithm to Pendulum and Mountain Car problems,experimental results show that the SVR-DDPG algorithm has a faster convergence rate and better stability than the original algorithm,which proves the effectiveness of the algorithm.
作者 杨薛钰 陈建平 傅启明 陆悠 吴宏杰 YANG Xueyu;CHEN Jianping;FU Qiming;LU You;WU Hongjie(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Suzhou Key Laboratory of Mobile Networking and Applied Technologies,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Zhuhai Mizao Intelligent Technology Co.,Ltd.,Zhuhai,Guangdong 519000,China;Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China)
出处 《计算机工程与应用》 CSCD 北大核心 2021年第19期104-111,共8页 Computer Engineering and Applications
基金 国家自然科学基金(61876217,61876121,61772357,61750110519,61772355,61702055,61672371) 江苏省重点研发计划项目(BE2017663)。
关键词 深度强化学习 深度Q学习算法(DQN) 深度确定性策略梯度算法(DDPG) 随机方差缩减梯度技术 deep reinforcement learning Deep Q-Network(DQN) Deep Deterministic Policy Gradient(DDPG) stochastic variance reduced techniques
  • 相关文献

参考文献3

二级参考文献30

  • 1BENGIO Y, DELALLEAU O. On the expressive power of deep archi- tectures[ C ]//Proc of the 14th International Conference on Discovery Science. Berlin : Springer-Verlag, 2011 : 18 - 36.
  • 2BENGIO Y. Leaming deep architectures for AI[ J]. Foundations and Trends in Machine Learning ,2009,2 ( 1 ) : 1-127.
  • 3HINTON G,OSINDERO S,TEH Y. A fast learning algorithm for deep belief nets [ J ]. Neural Computation ,2006,18 (7) : 1527-1554.
  • 4BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks [ C ]//Proc of the 12th Annual Conference on Neural Information Processing System. 2006:153-160.
  • 5LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning ap- plied to document recognition[ J]. Proceedings of the iEEE, 1998, 86( 11 ) :2278-2324.
  • 6VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[ C ]//Proc of the 25th International Conference on Machine Learning. New York: ACM Press ,2008 : 1096-1103.
  • 7VINCENT P, LAROCHELLE H, LAJOIE I, et aL Stacked denoising autoencoders:learning useftd representations in a deep network with a local denoising criterion [ J ]. Journal of Machine Learning Re- search ,2010,11 ( 12 ) :3371-3408.
  • 8YU Dong, DENG Li. Deep convex net: a scalable architecture for speech pattern classification [ C]//Proc of the 12th Annual Confe-rence of International Speech Comunication Association. 2011 : 2285- 2288.
  • 9POON H, DOMINGOS P. Sum-product networks:a new deep architec- ture[ C ]//Proc of IEEE Intemational Conference on Computer Vi- sion. 2011:689-690.
  • 10BENGIO Y,LECUN Y. Scaling learning algorithms towards AI[ M]// BOTTOU L,CHAPELLE O, DeCOSTE D,et al. Large-Scale Kernel Machines. Cambridge: MIT Press ,2007:321-358.

共引文献618

同被引文献22

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部