期刊文献+

一种基于状态聚类的SARSA(λ)强化学习算法 被引量:3

SARSA( λ) Algorithm of Reinforcement Learning Basd on States Clustering
下载PDF
导出
摘要 为求解大状态空间的强化学习问题,提出了一种基于状态聚类的SARSA(λ)强化学习算法,其基本思想是利用先验知识或事先训练控制器,对状态空间进行聚类,分为不同的簇,然后在簇空间上进行SARSA(λ)学习。若能进行适当的状态聚类,算法将可得到一个相对好的近似值函数. For solving large-scale reinforcement learning problem, a new SARSA(λ) algorithm of reinforcement learning based on states clustering is proposed. The principle idea of the algorithm is that it can first use the prior knowledge or train the controller to cluster the state space, the state space is lelustered to many clusters, then do SARSA(λ) learning in the cluster space. If the states are clustered properly, the algorithm can get a suitable approximate value function.
出处 《计算机工程》 CAS CSCD 北大核心 2003年第5期37-38,98,共3页 Computer Engineering
关键词 SARSA(λ)强化学习算法 状态聚类 强化学习 函数近似 SARSA学习 人工神经网络 Reinforcement learning Function approximation State space clustering SARSA learning
  • 相关文献

参考文献6

  • 1[1]Watkins C J C H. Learning from Delayed Rewards[Ph.D. Thesis].London: Cambridge Univ., 1989
  • 2[2]Bertsekas N A, Tsitsiklis J N. Neuro-dynamic Programming[M]. MA:Athena Scientific, 1996
  • 3[3]Tsitsiklis J N, Roy B V. An analysis of Temporal-difference Learning with Function Approximation[J]. IEEE Trans. Auto. Contr., 1997, 42(5): 674-690
  • 4[4]Jain A K, Murty M N, Flynn PJ. Data Clustering: A Survey[J]. ACM Comput. Surv., 1999, 31:264-323
  • 5[5]Sutton R S, Barto A G. An Introduction to Reinforcement Learning [M]. MA: The MIT Press, 1998
  • 6[6]Howard R A. Dynamic Programming and Markov Process[M]. MA:The MIT Press, 1960

同被引文献31

引证文献3

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部