期刊文献+

概率近似正确的强化学习算法解决连续状态空间控制问题 被引量:5

Probably approximately correct reinforcement learning solving continuous-state control problem
下载PDF
导出
摘要 在线学习时长是强化学习算法的一个重要指标.传统在线强化学习算法如Q学习、状态–动作–奖励–状态–动作(state-action-reward-state-action,SARSA)等算法不能从理论分析角度给出定量的在线学习时长上界.本文引入概率近似正确(probably approximately correct,PAC)原理,为连续时间确定性系统设计基于数据的在线强化学习算法.这类算法有效记录在线数据,同时考虑强化学习算法对状态空间探索的需求,能够在有限在线学习时间内输出近似最优的控制.我们提出算法的两种实现方式,分别使用状态离散化和kd树(k-dimensional树)技术,存储数据和计算在线策略.最后我们将提出的两个算法应用在双连杆机械臂运动控制上,观察算法的效果并进行比较. One important factor of reinforcement learning (RL) algorithms is the online learning time. Conventional algorithms such Q-learning and state-action-reward-state-action (SARSA) can not give the quantitative analysis on the upper bound of the online learning time. In this paper, we employ the idea of probably approximately correct (PAC) and design the data-driven online RL algorithm for continuous-time deterministic systems. This class of algorithms efficiently record online observations and keep in mind the exploration required by online RL. They are capable to learn the nearoptimal policy within a finite time length. Two algorithms are developed, separately based on state discretization and kd-tree technique, which are used to store data and compute online policies. Both algorithms are applied to the two-linkmanipulator to observe the performance.
作者 朱圆恒 赵冬斌 ZHU Yuan-heng;ZHAO Dong-bin(State Key Laboratory of Management and Control for Complex Systems, Institution of Automation,Chinese Academy of Sciences, Beijing 100190, China)
出处 《控制理论与应用》 EI CAS CSCD 北大核心 2016年第12期1603-1613,共11页 Control Theory & Applications
基金 国家自然科学基金项目(61273136 61573353 61533017 61603382) 复杂系统管理与控制国家重点实验室优秀人才基金项目资助~~
关键词 强化学习 概率近似正确 KD树 双连杆机械臂 reinforcement learning probably approximately correct kd-tree two-link manipulator
  • 相关文献

参考文献4

二级参考文献131

  • 1杨璐,洪家荣,黄梯云.用加强学习方法解决基于神经网络的时序实时建模问题[J].哈尔滨工业大学学报,1996,28(4):136-139. 被引量:2
  • 2阎平凡.再励学习——原理、算法及其在智能控制中的应用[J].信息与控制,1996,25(1):28-34. 被引量:30
  • 3MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-levelcontrol through deep reinforcement learning [J]. Nature, 2015,518(7540): 529 – 533.
  • 4SILVER D, HUANG A, MADDISON C, et al. Mastering the gameof Go with deep neural networks and tree search [J]. Nature, 2016,529(7587): 484 – 489.
  • 5AREL I. Deep reinforcement learning as foundation for artificialgeneral intelligence [M] //Theoretical Foundations of Artificial GeneralIntelligence. Amsterdam: Atlantis Press, 2012: 89 – 102.
  • 6TEAAURO G. TD-Gammon, a self-teaching backgammon program,achieves master-level play [J]. Neural Computation, 1994,6(2): 215 – 219.
  • 7SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge MA: MIT Press, 1998.
  • 8KEARNS M, SINGH S. Near-optimal reinforcement learning inpolynomial time [J]. Machine Learning, 2002, 49(2/3): 209 – 232.
  • 9KOCSIS L, SZEPESVARI C. Bandit based Monte-Carlo planning[C] //Proceedings of the European Conference on MachineLearning. Berlin: Springer, 2006: 282 – 293.
  • 10LITTMAN M L. Reinforcement learning improves behaviour fromevaluative feedback [J]. Nature, 2015, 521(7553): 445 – 451.

共引文献465

同被引文献18

引证文献5

二级引证文献121

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部