部分可观测Markov环境下的激励学习综述

A Survey on Reinforcement Learning under Partially Observable Markov Environment

下载PDF

导出

摘要对智能体在不确定环境下的学习与规划问题的激励学习技术进行了综述 .首先介绍了用于描述隐状态问题的部分可观测Markov决策理论 (POMDPs) ,在简单回顾其它POMDP求解技术后 ,重点讨论环境模型事先未知的激励学习技术 ,包括两类 :一类为基于状态的值函数学习 ;一类为策略空间的直接搜索 .最后分析了这些方法尚存在的问题 ,并指出了未来可能的研究方向 . It is described how techniques from reinforcement learning might be used to approach the problem of acting under uncertainty.By introducing the theory of partially observable Markov desision processes(POMDP) to describe what is called hidden state problems. After a brief review of other POMDP solution techniques,reinforcement learning is motivafed by considering an agent with no previous knowledge of the environment model. Two major groups of reinforcement learning techniques are described: one is a value function over states of world, and the ofter is search in the space of policies directly. Finally,the general problems with these methods, and suggest promising avenues for future research are discussed.

作者谢丽娟陈焕文

机构地区湖南师范大学心理学系长沙电力学院数学与计算机系

出处《长沙电力学院学报（自然科学版）》 2002年第2期23-27,共5页 JOurnal of Changsha University of electric Power:Natural Science

基金国家自然科学基金资助项目 (60 0 75 0 19)

关键词激励学习部分可观测Markov决策过程机器学习人工智能智能体值函数学习策略空间 reinforcement learning(RL) partially observable Markov decision processes(POMDPs) machine learning artificial intelligence(AI)

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献8

1[1]Kaelbing L, Littman M, Cassandra A. Planning and acting in partially observable stochastic domains[J]. Artificial Intelligence, 1998, 101 (1):99-134.
2[2]Cassandra A. Exact and approximate algorithms for partially observable Markov decision processes[D]. Providence: Brown University, 1998.
3[3]Parr R,Russcil S.Approximating optimal policics for partially observable stochastic domains[A]. In proceedings of the Fourteenth International Joint Conference on Artificial Intelligence [C]. Kaufmaann: Morgan,1995,1 088-1 094.
4[4]Boyen X ,Koller D.Tractable inference for complex stochastic processes[A]. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence[C]. Kaufmaann: Morgan, 1998,33-42.
5[5]Doucct A, Godsill S, Andricu C. On sequcntial montc carlo sampling mcthods for Baycsian filtcring [J]. Statistics and Computing, 2000, 10(3): 197-208.
6[6]Sutton R, Barto A. Reinforcement learning: an introduction [M].Newyork: MIT Press, 1998.
7[7]Whitehead S,Ballard D. Learning to perceive and act by trial and error[J]. Machine Learning, 1991,7(1) :45-83.
8[8]McCallum R A. Instance-based state identification for reinforcement learning[A]. In Advances in Neural Information Processing Systems[C]. Kaufmaann: Morgan, 1995,377-384.

1任丽芳,王文剑,许行.不确定感知的自适应云计算服务组合[J].计算机研究与发展,2016,53(12):2867-2881. 被引量：7
2冯亚丽,刘阳,赵艳玲,佟巍.基于遗传算法的数据库多连接查询优化策略[J].佳木斯大学学报（自然科学版）,2007,25(4):506-508. 被引量：3
3温慧明,宫晓辉,焦洋.基于网格服务的半连接查询优化算法研究[J].计算机技术与发展,2012,22(9):123-126.
4房俊恒,朱斐,刘全,伏玉琛,凌兴宏.一种基于独立任务的POMDP问题的解决方法[J].计算机应用研究,2016,33(1):147-152.
5陈焕文,殷苌茗,谢丽娟.U-Clustering:基于效用聚类的激励学习算法[J].计算机工程与应用,2005,41(26):37-42.
6何波,刘全利,王越,王华秋.故障诊断自适应策略研究[J].微计算机信息,2006,22(10S):235-237. 被引量：3
7陈冬松,潘成胜,俞承志,王光兴.一种基于策略的配置管理思想[J].火力与指挥控制,2003,28(5):74-76. 被引量：3
8布莱恩.伯格斯坦.虚拟人力资源管理[J].科技创业,2011(12):81-81.
9孟孟.桌面云集群使用的资源调度方法现状分析[J].信息通信,2014,27(3):159-159.
10周易军,周沫.电子装备故障诊断自适应策略研究[J].舰船电子工程,2012,32(8):113-114.

长沙电力学院学报（自然科学版）

2002年第2期

浏览历史

内容加载中请稍等...

部分可观测Markov环境下的激励学习综述

参考文献8

相关作者

相关机构

相关主题

浏览历史