期刊文献+

一种基于联合神经网络的连续空间行动者评论家学习方法

An Actor-Critic Learning Approach Based on Joint Neural Network in Continuous Space
下载PDF
导出
摘要 在复杂的连续空间应用场景中,经典的离散空间强化学习方法已难以满足实际需要,而已有的连续空间强化学习方法主要采用线性拟合方法逼近状态值函数和动作选择函数,存在精度不高的问题。提出一种基于联合神经网络非线性行动者评论家方法(actor-critic approach based on union neural network,UNN-AC)。该方法将动作选择函数和评论值函数表示为统一的联合神经网络模型,利用联合神经网络非线性拟合状态值函数和动作选择概率。与已有的线性拟合方法相比,非线性UNN-AC提高了对评论值函数和动作选择函数的拟合精度。实验结果表明,UNN-AC算法能够有效求解连续空间中近似最优策略问题。与经典的连续动作空间算法相比,该算法具有收敛速度快和稳定性高的优点。 In the complex application scenarios of continuous space,it has been difficult for the classical reinforcement learning method in discrete space to meet the practical needs.The existing reinforcement learning method in continuous space mainly,however,uses linear fitting method to approximate the state value function and action selection function,and consequently has the problem of low accuracy.A nonlinear joint neural network based actor-critic approach(UNN-AC)is proposed in this paper.The action selection function and the evaluation value function are expressed as a unified joint neural network model.The joint neural network is used to fit the state value function and the action selection probability nonlinearly.Compared with the existing linear fitting methods,the non-linear UNN-AC can improve the fitting accuracy of the comment value function and the action selection function.The results show that the UNN-AC algorithm can effectively solve the approximate optimal strategy problem in continuous space.Compared with the classical continuous action space algorithm,the algorithm has the advantages of fast convergence and high stability.
作者 杨金鸿 谭斌 皇甫立 熊璋 YANG Jinhong;TAN Bin;HUANGFU Li;XIONG Zhang(Systems Engineering Research Institute of CSSC,Beijing 100094,China;College of Computer Science&Technology,Beihang University,Beijing 100192,China)
出处 《智能安全》 2022年第2期19-25,共7页
关键词 联合神经网络 连续空间 行动者评论家 非线性 joint neural network continuous space actor-critic nonlinear
  • 相关文献

参考文献3

二级参考文献45

  • 1Sutton R S,Barto A G. Reinforcement Learning:An Introduction[M].Cambridge,MA:MITPress,1998.
  • 2Sutton R S,Modayil J,Delp M. A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction[A].Richland,SC:International Foundation for Autonomous Agents and Multiagent Systems,2011.761-768.
  • 3Silver D,Sutton R S,Müller M. Temporal-difference search in computer Go[J].Machine Learning,2012,(02):183-219.
  • 4Sutton R S,McAllester D,Singh S. Policy gradient methods for reinforcement learning with function approximation[A].Cambridge,MA:The MIT Press,2000.1057-1063.
  • 5Jan P,Stefan S. Natural actor critic[J].NEUROCOMPUTING,2008,(07):1180-1190.
  • 6Jan P,Vijayakumar S,Stefan S. Reinforcement learning for humanoid robotics[A].Piscataway,NJ:IEEE,2003.1-20.
  • 7Degris T,Pilarski P M,Sutton R S. Model-free reinforcement learning with continuous action in practice[A].Piscataway,NJ:IEEE,2012.2177-2182.
  • 8van Hasselt H,Wiering M. Reinforcement learning in continuous action spaces[A].Piscataway,NJ:IEEE,2007.272-279.
  • 9van Hasselt H. Reinforcement Learning:State of the Art[M].Berlin:Springer-Verlag,2007.207-251.
  • 10Busoniu L,Babuska R,De Schutter B. Reinforcement Learning and Fynamic Programming Using Function Approximators[M].New York:CRC Press,2010.

共引文献58

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部