期刊文献+

部分可观测Markov环境下的激励学习综述

A Survey on Reinforcement Learning under Partially Observable Markov Environment
下载PDF
导出
摘要 对智能体在不确定环境下的学习与规划问题的激励学习技术进行了综述 .首先介绍了用于描述隐状态问题的部分可观测Markov决策理论 (POMDPs) ,在简单回顾其它POMDP求解技术后 ,重点讨论环境模型事先未知的激励学习技术 ,包括两类 :一类为基于状态的值函数学习 ;一类为策略空间的直接搜索 .最后分析了这些方法尚存在的问题 ,并指出了未来可能的研究方向 . It is described how techniques from reinforcement learning might be used to approach the problem of acting under uncertainty.By introducing the theory of partially observable Markov desision processes(POMDP) to describe what is called hidden state problems. After a brief review of other POMDP solution techniques,reinforcement learning is motivafed by considering an agent with no previous knowledge of the environment model. Two major groups of reinforcement learning techniques are described: one is a value function over states of world, and the ofter is search in the space of policies directly. Finally,the general problems with these methods, and suggest promising avenues for future research are discussed.
出处 《长沙电力学院学报(自然科学版)》 2002年第2期23-27,共5页 JOurnal of Changsha University of electric Power:Natural Science
基金 国家自然科学基金资助项目 (60 0 75 0 19)
关键词 激励学习 部分可观测Markov决策过程 机器学习 人工智能 智能体 值函数学习 策略空间 reinforcement learning(RL) partially observable Markov decision processes(POMDPs) machine learning artificial intelligence(AI)
  • 相关文献

参考文献8

  • 1[1]Kaelbing L, Littman M, Cassandra A. Planning and acting in partially observable stochastic domains[J]. Artificial Intelligence, 1998, 101 (1):99-134.
  • 2[2]Cassandra A. Exact and approximate algorithms for partially observable Markov decision processes[D]. Providence: Brown University, 1998.
  • 3[3]Parr R,Russcil S.Approximating optimal policics for partially observable stochastic domains[A]. In proceedings of the Fourteenth International Joint Conference on Artificial Intelligence [C]. Kaufmaann: Morgan,1995,1 088-1 094.
  • 4[4]Boyen X ,Koller D.Tractable inference for complex stochastic processes[A]. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence[C]. Kaufmaann: Morgan, 1998,33-42.
  • 5[5]Doucct A, Godsill S, Andricu C. On sequcntial montc carlo sampling mcthods for Baycsian filtcring [J]. Statistics and Computing, 2000, 10(3): 197-208.
  • 6[6]Sutton R, Barto A. Reinforcement learning: an introduction [M].Newyork: MIT Press, 1998.
  • 7[7]Whitehead S,Ballard D. Learning to perceive and act by trial and error[J]. Machine Learning, 1991,7(1) :45-83.
  • 8[8]McCallum R A. Instance-based state identification for reinforcement learning[A]. In Advances in Neural Information Processing Systems[C]. Kaufmaann: Morgan, 1995,377-384.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部