期刊文献+

一种用于LQR控制问题的强化学习方法 被引量:1

A Reinforcement Learning Method for LQR Control Problem
原文传递
导出
摘要 现有强化学习方法的收敛性分析大多针对离散状态问题,对于连续状态问题强化学习的收敛性分析仅局限于简单的 LQR 控制问题.本文对现有两种用于 LQR 问题收敛的强化学习方法进行分析,针对存在的问题,提出一种只需部分模型信息的强化学习方法.该方法使用递推最小二乘 TD(RLS-TD)方法估计值函数参数,递推最小二乘方法(RLS)估计贪心改进策略.并给出理想情况下此方法收敛的理论分析.仿真实验表明该方法收敛到最优控制策略. Current convergence analyses of reinforcement learning method are mainly applied to discrete state problems. Analyses of continuous state reinforcement learning method are limited to simple LQR control problems. After analyzing two convergent reinforcement learning methods for LQR control problem, a new method only requiring partial model information is proposed to make up for the defects of these two methods. In this method, a recursive least-squares TD method is used to estimate parameters of value function and a recursive least-squares method is used to estimate the greedily improved policy . In theoretical analysis , a convergence proof is presented for the proposed policy iteration method in ideal case. Simulation result shows that this method converges an optimal control policy.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2006年第3期406-411,共6页 Pattern Recognition and Artificial Intelligence
关键词 强化学习 递推最小二乘 TD学习 最优控制 Reinforcement Learning, Recursive Least Squares, Temporal Difference, Optimal Control
  • 相关文献

参考文献1

二级参考文献5

  • 1[1]Sutton R S, Barto A G. Reinforcement Learning: An Introduction [M]. MIT Press, 1998.
  • 2[2]Barto A G, Sutton R S, Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems [J]. IEEE Transactions on SMC,1983, 13(5) :834 ~846.
  • 3[3]Xu X, He H-G, Hu D W. Efficient reinforcement learning using recursive least-squares methods [ J ]. Journal of Artificial Intelligence Research, 2002,16:259 ~292.
  • 4[4]Kimura H, Kobayashi S. An analysis of actor/critic algorithms using eligibility traces: reinforcement learning with imperfect value functions [A]. 15th Int. Conf. on Machine Learning [C]. Madison, 1998. 278~286.
  • 5顾冬雷,陈卫东,席裕庚.一种基于增强学习的自适应控制方法[J].控制与决策,2002,17(4):473-475. 被引量:4

共引文献1

同被引文献9

  • 1文锋,陈宗海,卓睿,周光明.连续状态自适应离散化基于K-均值聚类的强化学习方法[J].控制与决策,2006,21(2):143-147. 被引量:7
  • 2陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量:13
  • 3任燚,陈宗海.基于强化学习算法的多机器人系统的冲突消解策略[J].控制与决策,2006,21(4):430-434. 被引量:7
  • 4Prokhorov D V,Wunsch D C.Adaptive critic designs[J].IEEE Trans on Neural Networks,1997.8(5);997-1007.
  • 5Landelius T.Reinforcement learning and distributed local model synthesis[D].Sweden;Linkoping University.1997.
  • 6Prokhorov D V,Wunsch D C.Convergence of criticbased training[C].Proc IEEE Int Conf System Management Cybernation.Tokyo,1997,4:3057-3060.
  • 7Xin Liu,Balakrishnan S N.Convergence analysis of adaptive critic based optimal control[C].Proc of American Control Conf.Chicago,2000:1929-1933.
  • 8Bradtke S J.Incremental dynamic programming for online adaptive optimal control[D].Massa:University of Massachusetts,1994.
  • 9Jagannathan S.Adaptive critic neural network-based controller for nonlinear systems[C].Proc of IEEE Int Symposium on Intelligent Control.Vancouver,2002:303-308.

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部