期刊文献+

基于状态-动作图测地高斯基的策略迭代强化学习 被引量:5

Policy Iteration Reinforcement Learning Based on Geodesic Gaussian Basis Defined on State-action Graph
下载PDF
导出
摘要 在策略迭代强化学习中,基函数构造是影响动作值函数逼近精度的一个重要因素.为了给动作值函数逼近提供合适的基函数,提出一种基于状态-动作图测地高斯基的策略迭代强化学习方法.首先,根据离策略方法建立马尔可夫决策过程的状态-动作图论描述;然后,在状态-动作图上定义测地高斯核函数,利用基于近似线性相关的核稀疏方法自动选择测地高斯核的中心;最后,在策略评估阶段利用基于状态-动作图的测地高斯核逼近动作值函数,并基于估计的值函数进行策略改进.10×10格子世界的仿真结果表明,与基于状态图普通高斯基和测地高斯基的策略迭代强化学习方法相比,本文所提方法能以较少的基函数、高精度地逼近具有光滑且不连续特性的动作值函数,从而有效地获得最优策略. For policy iteration reinforcement learning methods,the construction of basis functions is an important factor of influencing the accuracy of action-value function approximation.In order to construct appropriate basis functions for the action-value function approximation,a policy iteration reinforcement learning method based on geodesic Gaussian basis defined on state-action graph is proposed.At first,a state-action graph for a Markov decision process is constructed according to an off-policy method.Secondly,geodesic Gaussian kernel functions are defined on the state-action graph and a kernel sparsification approach based on approximate linear dependency is used to automatically select centers of the geodesic Gaussian kernels.At last,the geodesic Gaussian kernels based on the state-action graph is used to approximate the action-value function during the process of policy evaluation,and then the policy is improved based on the estimated action-value function.Simulation results concerning a 10 × 10 grid-world illustrate that the proposed method can accurately approximate the action-value function having smoothness and discontinuity properties with less basis functions as compared with the policy iteration reinforcement learning methods based on either ordinary Gaussian basis or geodesic Gaussian basis defined on a state graph,which is helpful for obtaining an optimal policy effectively.
出处 《自动化学报》 EI CSCD 北大核心 2011年第1期44-51,共8页 Acta Automatica Sinica
基金 国家自然科学基金(60804022 60974050 61072094) 教育部新世纪优秀人才支持计划(NCET-08-0836) 霍英东教育基金会青年教师基金(121066) 江苏省自然科学基金(BK2008126)资助~~
关键词 状态-动作图 测地高斯核 基函数 策略迭代 强化学习 State-action graph geodesic Gaussian kernel basis function policy iteration reinforcement learning
  • 相关文献

参考文献14

  • 1高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:258
  • 2Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: The MIT Press, 1998.
  • 3Wang X S, Cheng Y H, Yi J Q. A fuzzy ActorCritic reinforcement learning network. Information Sciences, 2007, 177(18): 3764-3781.
  • 4王雪松,田西兰,程玉虎,易建强.基于协同最小二乘支持向量机的Q学习[J].自动化学报,2009,35(2):214-219. 被引量:20
  • 5赵冬斌,刘德荣,易建强.基于自适应动态规划的城市交通信号优化控制方法综述[J].自动化学报,2009,35(6):676-681. 被引量:39
  • 6Xu X, Hu D W, Lu X C. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 2007, 18(4): 973-992.
  • 7Lagoudakis M G, Parr R. Least-squares policy iteration. Journal of Machine Learning Research, 2003, 4:1107-1149.
  • 8Konidaris G, Osentoski S. Value Function Approximation in Reinforcement Learning Using the Fourier Basis, Technical Report UM-CS-2008-19, Department of Computer Science, University of Massachusetts Amherst, USA, 2008.
  • 9Mahadevan S, Maggioni M. Value function approximation with diffusion wavelets and Laplacian eigenfunctions. In: Proceedings of the Advances in Neural Information Processing Systems 18. Cambridge, USA: The MIT Press, 2006. 843-850.
  • 10Sugiyama M, Hachiya H, Towell C, Vijayakumar S. Value function approximation on non-linear manifolds for robot motor control. In: Proceedings of the IEEE International Conference on Robotics and Automation. Rome, Italy: IEEE. 2007. 1733-1740.

二级参考文献25

  • 1DerongLiu.Approximate Dynamic Programming for Self-Learning Control[J].自动化学报,2005,31(1):13-18. 被引量:14
  • 2高阳,胡景凯,王本年,王冬黎.基于CMAC网络强化学习的电梯群控调度[J].电子学报,2007,35(2):362-365. 被引量:13
  • 3郭锐,吴敏,彭军,彭姣,曹卫华.一种新的多智能体Q学习算法[J].自动化学报,2007,33(4):367-372. 被引量:13
  • 4Suykens J A K, Vandewale J. Least squares support vector machine classifiers. Neural Processing Letters, 1999, 9(3): 293-300.
  • 5Watkins C J C H, Dayan P. Q-learning. Machine Learning, 1992, 8(3-4): 279-292.
  • 6Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 1996, 4(2): 237-285.
  • 7Kyriakos M, Dimitris P. Continuous nearest neighbor queries over sliding windows. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(6): 789-803.
  • 8Wang X S, Tian X L, Cheng Y H. Value approximation with least squares support vector machine in reinforcement learning system. Journal of Computational and Theoretical Nanoscience, 2007, 4(7-8): 1290-1294.
  • 9Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
  • 10Conn K, Peters R A. Reinforcement learning with a supervisor for a mobile robot in a real-world environment. In: Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation. Piscataway, USA: IEEE, 2007. 73-78.

共引文献362

同被引文献29

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部