期刊文献+

近似强化学习算法研究综述 被引量:4

Review of Research on Approximate Reinforcement Learning Algorithms
下载PDF
导出
摘要 强化学习用于解决无模型情况下的优化决策问题,是实现人工智能的重要技术之一,但传统的表格型强化学习方法难以处理具有大规模、连续空间的控制问题。近似强化学习受到函数逼近思想的启发,对价值函数或策略函数参数化表示,通过参数优化间接获得最优行为策略,在视频游戏、棋类对抗及机器人控制等领域应用效果显著。基于此,对近似强化学习算法的研究现状与应用进展进行了梳理和综述。介绍了近似强化学习相关的基础理论;分类总结了近似强化学习的经典算法及一些相应的改进方法;概述了近似强化学习在机器人控制领域的研究进展,并总结了当前面临的若干主要问题,为后续的研究提供参考。 Reinforcement learning(RL)is one of the most important techniques for artificial intelligence(AI). However,traditional tabular reinforcement learning is difficult to deal with control problems with large scale or continuous space.Approximate reinforcement learning is inspired by the idea of function approximation to parameterize the value function or strategy function, and obtains the optimal strategy indirectly through parameter optimization. It has been widely used in video games, Go game, robot control, etc. and obtained remarkable performance. In view of this, this paper reviews the research status and application progress of approximate reinforcement learning algorithms. Firstly, the basic theory of approximate reinforcement learning is introduced. Then the classical algorithms of approximate reinforcement learning are classified and expounded, including some corresponding improvement methods. Finally, the research progress of approximate reinforcement learning in robotics is summarized, and some major problems are summarized to provide reference for future research.
作者 司彦娜 普杰信 孙力帆 SI Yanna;PU Jiexin;SUN Lifan(School of Information Engineering,Henan University of Science and Technology,Luoyang,Henan 471023,China;School of Information and Communication Engineering,University of Electronic Science and Technology,Chengdu 611731,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第8期33-44,共12页 Computer Engineering and Applications
基金 航空科学基金(20185142003) 国家国防基础科学研究计划(JCKY2018419C001)。
关键词 强化学习 连续空间 值函数近似 直接策略搜索 策略梯度 reinforcement learning continuous space value function approximation direct policy search policy gradient
  • 相关文献

参考文献10

二级参考文献62

  • 1段凡丁.关于最短路径的SPFA快速算法[J].西南交通大学学报,1994,29(2):207-212. 被引量:57
  • 2孙炜,王耀南.模糊CMAC及其在机器人轨迹跟踪控制中的应用[J].控制理论与应用,2006,23(1):38-42. 被引量:20
  • 3王学宁,陈伟,张锰,徐昕,贺汉根.增强学习中的直接策略搜索方法综述[J].智能系统学报,2007,2(1):16-24. 被引量:8
  • 4Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: The MIT Press, 1998.
  • 5Wang X S, Cheng Y H, Yi J Q. A fuzzy ActorCritic reinforcement learning network. Information Sciences, 2007, 177(18): 3764-3781.
  • 6Xu X, Hu D W, Lu X C. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 2007, 18(4): 973-992.
  • 7Lagoudakis M G, Parr R. Least-squares policy iteration. Journal of Machine Learning Research, 2003, 4:1107-1149.
  • 8Konidaris G, Osentoski S. Value Function Approximation in Reinforcement Learning Using the Fourier Basis, Technical Report UM-CS-2008-19, Department of Computer Science, University of Massachusetts Amherst, USA, 2008.
  • 9Mahadevan S, Maggioni M. Value function approximation with diffusion wavelets and Laplacian eigenfunctions. In: Proceedings of the Advances in Neural Information Processing Systems 18. Cambridge, USA: The MIT Press, 2006. 843-850.
  • 10Sugiyama M, Hachiya H, Towell C, Vijayakumar S. Value function approximation on non-linear manifolds for robot motor control. In: Proceedings of the IEEE International Conference on Robotics and Automation. Rome, Italy: IEEE. 2007. 1733-1740.

共引文献34

同被引文献19

引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部