摘要
对智能体在不确定环境下的学习与规划问题的激励学习技术进行了综述 .首先介绍了用于描述隐状态问题的部分可观测Markov决策理论 (POMDPs) ,在简单回顾其它POMDP求解技术后 ,重点讨论环境模型事先未知的激励学习技术 ,包括两类 :一类为基于状态的值函数学习 ;一类为策略空间的直接搜索 .最后分析了这些方法尚存在的问题 ,并指出了未来可能的研究方向 .
It is described how techniques from reinforcement learning might be used to approach the problem of acting under uncertainty.By introducing the theory of partially observable Markov desision processes(POMDP) to describe what is called hidden state problems. After a brief review of other POMDP solution techniques,reinforcement learning is motivafed by considering an agent with no previous knowledge of the environment model. Two major groups of reinforcement learning techniques are described: one is a value function over states of world, and the ofter is search in the space of policies directly. Finally,the general problems with these methods, and suggest promising avenues for future research are discussed.
出处
《长沙电力学院学报(自然科学版)》
2002年第2期23-27,共5页
JOurnal of Changsha University of electric Power:Natural Science
基金
国家自然科学基金资助项目 (60 0 75 0 19)