期刊文献+

基于神经网络的Sarsa强化学习算法 被引量:4

Sarsa Reinforcement Learning Algorithm Based on Neural Networks
下载PDF
导出
摘要 标准的Sarsa算法对状态空间的要求是离散的且空间较小,而实际问题中很多的系统的状态空间是连续的或尽管是离散的但空间较大,这就要求有很大的空间来存储状态动作对(State-Action-Pair)。对此文中提出用BP网络队列保存SAPs,实验验证可以解决由于空间过大而带来的Q值表示问题。 The standard Sarsa algorithm requires that the state space is discrete and small. However, in real environment it does not satisfy that due to the fact that it may be continuous or discrete but has big space state, so it needs too memory to keep State - Action - pair (SAPs). This paper proposes to use BP queue to store SAPs. The experiment shows it can resolve the problem that how to represent Q values in case of big state space.
出处 《计算机技术与发展》 2006年第1期30-32,共3页 Computer Technology and Development
关键词 强化学习 智能主体 马尔可夫决策过程 误差后向传播网络 状态动作对 reinforcement learning agent MDP ( Markov decision process ) BP ( back propagation ) SAP ( state - action - pair)
  • 相关文献

参考文献5

  • 1Astom K J. Optimal control of Markov derision processes with incomplete state estimation[J ]. Math'Anal Appl, 1998,10:174 - 205.
  • 2Tsitsiklis J N, Roy B V. An Analysis of Temporal-Difference Learning with Function Approximation[J]. IEEE Transactions on Automatic Control, 1997,42 (5) : 674 - 690.
  • 3Tesauro G J. TD-gammon, a self- teaching backgammon program[J]. Neural Computation, 1994, 6(2) :215 - 2192.
  • 4Suton R S, Learning to predict by the methods of temporal diferences[J]. Machine Learning, 1988(3): 9 - 44.
  • 5Suton R S,Barto A G. Reinforcement Learning: Introduction[M].Cambridge,MA:MIT Press,1998.

同被引文献45

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部