摘要
针对深度确定性策略梯度算法(DDPG)收敛速度比较慢,训练不稳定,方差过大,样本应用效率低的问题,提出了一种基于随机方差减小梯度方法的深度确定性策略梯度算法(SVR-DDPG)。该算法通过利用随机方差减小梯度技术(SVRG)提出一种新的创新优化策略,将之运用到DDPG算法之中,在DDPG算法的参数更新过程中,加入了随机方差减小梯度技术,利用该方法的更新方式,使得估计的梯度方差有一个不断减小的上界,令方差不断缩小,从而在小的随机训练子集的基础上找到更加精确的梯度方向,以此来解决了由近似梯度估计误差引发的问题,加快了算法的收敛速度。将SVR-DDPG算法以及DDPG算法应用于Pendulum和Mountain Car问题,实验结果表明,SVR-DDPG算法具有比原算法更快的收敛速度,更好的稳定性,以此证明了算法的有效性。
Aiming at the problem that the Deep Deterministic Policy Gradient(DDPG)algorithm has slow convergence speed,training instability,large variance and poor sample efficiency.This paper proposes a deep deterministic policy gradient algorithm by utilizing Stochastic Variance Reduced Gradient(SVRG)algorithm.By utilizing stochastic variance reduced techniques,it proposes an innovative optimization strategy,applying it to DDPG algorithm.In the parameter update process of the DDPG algorithm,by using the update mode of this method,the estimated gradient variance has a decreasing upper bound,so that the variance decreases continuously,so as to find a more accurate gradient direction on the basis of a small random training subset.This strategy solves the problem caused by the approximate gradient error,speeds up the convergence speed of the algorithm.Applying SVR-DDPG algorithm and DDPG algorithm to Pendulum and Mountain Car problems,experimental results show that the SVR-DDPG algorithm has a faster convergence rate and better stability than the original algorithm,which proves the effectiveness of the algorithm.
作者
杨薛钰
陈建平
傅启明
陆悠
吴宏杰
YANG Xueyu;CHEN Jianping;FU Qiming;LU You;WU Hongjie(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Suzhou Key Laboratory of Mobile Networking and Applied Technologies,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China;Zhuhai Mizao Intelligent Technology Co.,Ltd.,Zhuhai,Guangdong 519000,China;Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou,Suzhou University of Science and Technology,Suzhou,Jiangsu 215009,China)
出处
《计算机工程与应用》
CSCD
北大核心
2021年第19期104-111,共8页
Computer Engineering and Applications
基金
国家自然科学基金(61876217,61876121,61772357,61750110519,61772355,61702055,61672371)
江苏省重点研发计划项目(BE2017663)。