期刊文献+

多步截断SARSA强化学习算法 被引量:5

An algorithm of reinforcement learning for a truncated multi-step SARSA
下载PDF
导出
摘要 提出了一种新的 on- policy强化学习算法 ,其基本思想是按照一定学习策略 ,利用 k(k >1)步的信息来估计 TD (λ)回报值 ,从而加快对行动最优值估计的更新。更新速度比 SARSA (0 )算法快 ,但不象 SARSA (λ) In this paper, we propose a new on policy reinforcement learning algorithm The main principle of the algorithm is based on a policy That is, using the information of k(k>1) estimates the return value of TD(λ), leading to a faster renewal of estimating the optimal value of actions The renewal speed is faster than the algorithm of SARSA(0) but less calculation than SARSA(λ)
出处 《广西工学院学报》 CAS 2002年第1期1-4,共4页 Journal of Guangxi University of Technology
关键词 强化学习 MARKOV决策过程 Q学习 SQRSA学习 机器学习 多步截断SARSA强化学习算法 reinforcement learning MDPs Q learning SARSA learning
  • 相关文献

参考文献6

  • 1[1]Watkins C.J. C. H. Learning from delayed rewards [D] . Cambridge Univ. , England. 1989.
  • 2[2]Sutton R.S.Learning to predict by the method of temporal difference [J] .Machine Learning , 1988, (3): 9-44.
  • 3[3]Peng J.& Williams R.Incremental multi-step Q-learning [J] .Machine Learning, 1996, (22): 283-290.
  • 4[4]Rummery G.A & Niranjan M.On-line Q-learning using connectionist systems [R] .CUED/F-INFENG/TR 166,Cambridge University, UK.1994.
  • 5[5]Bertsekas D.P.Dynamic programming: deterministic and stochastic models [M] .Prentice Hall, USA.1987.
  • 6[6]Sutton R.S.& Barto A.G.An introduction to reinforcement learning [M] .The MIT Press, USA.1998.

同被引文献42

  • 1杨威,李俊山,张媛莉.基于HLA的雷达对抗训练仿真系统研究[J].微计算机信息,2006,22(01S):240-242. 被引量:19
  • 2承向军,常歆识,杨肇夏.基于Q-学习的交通信号控制方法[J].系统工程理论与实践,2006,26(8):136-140. 被引量:14
  • 3赵晓华,李振龙,陈阳舟.基于Q学习的城市交通信号灯混杂控制(英文)[J].系统仿真学报,2006,18(10):2889-2894. 被引量:4
  • 4陈洪,陈森发.单路口交通实时模糊控制的一种方法[J].信息与控制,1997,26(3):227-233. 被引量:61
  • 5Piao Songhao, Hang Bingrong. Fast Reinforcement earning Appro-ach to Cooperative Behavior Acquisition in Multi-agent System.Proceedings of the 2002 IEEE/RSJ Intl. Conference on IntelligentRobots and Systems, Lausanne, Switzerland. 2002-10:871- 875
  • 6洪炳镕.机器人足球技术的发展战略[A].中国人工智能学会第9届全国学术年会论文集[C].2001.
  • 7STONE P , VELOSO M. Muhi2agent systems : a survey from a machine learning perspective [ R] . CMU CS technical re2 port , No. CMU - CS - 97 - 193.Server. Proc. of IROS Workshop on Robocup, 1996
  • 8Nobuo S, Akira H. A Muhiagent Reinforcement Learning Algorithm Using Extended Optimal Response. Proc. of the First International Joint Conference on Autonomous Agents & Multiagent Systems,Bologna, Italy, 2002-07:370- 377
  • 9Hu Junling, Michael W P. Muhiagent Reinforcement Learning: Theoretical Framework and an Algorithm. Proc.15th International Conf. on Machine Learning 1998:242- 250
  • 10Caroline C, Craig B. The Dynamics of Reinforcement Learning in Cooperative Muhiagent Systems. In Proc. Workshop on Multi-agent Learning, 1997:602- 608

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部