期刊文献+

求解POMDP的动态合并激励学习算法 被引量:1

Dynamic Merge Reinforcement Learning Algorithm for Solving POMDP
下载PDF
导出
摘要 把POMDP作为激励学习(ReinforcementLearning)问题的模型,对于具有大状态空间问题的求解有比较好的适应性和有效性。但由于其求解的难度远远地超过了一般的Markov决策过程(MDP)的求解,因此还有许多问题有待解决。该文基于这样的背景,在给定一些特殊的约束条件下提出的一种求解POMDP的方法,即求解POMDP的动态合并激励学习算法。该方法利用区域的概念,在环境状态空间上建立一个区域系统,Agent在区域系统的每个区域上独自并行地实现其最优目标,加快了运算速度。然后把各组成部分的最优值函数按一定的方式整合,最后得出POMDP的最优解。 This paper advances a new algorithm for solving a POMDP with some restriction conditions, which is the dynamic merge reinforcement learning method for solving a POMDE This algorithm approves the conception of regions and then the paper sets up a regional system on state space of environment. The agent searches its optimal sub-goal separately at each region in regional system using parallel method, for the sake of speeding up the computations over this algorithm, and then merges these optimal solutions on each region to get a global optimal solution for this POMDP.
出处 《计算机工程》 EI CAS CSCD 北大核心 2005年第22期4-6,148,共4页 Computer Engineering
基金 国家自然科学基金资助项目(60075019)
关键词 部分可观测Markov决策过程 激励学习 动态合并 信度状态 Partially observable Markov decision process Reinforcement learning Dynamic merge Belief state
  • 相关文献

参考文献5

  • 1McCallum A. Efficient Exploration in Reinforcement Learning with Hidden State. In: AAAI Fall Symposium on Model-directed Autonomous Systems, 1997.
  • 2McCallum A. Reinforcement Learning with Selective Perception and Hidden State[ Ph.D. Thesis]. Rochester NY: Dept. of Computer Science, University of Rochester, 1995.
  • 3Zhang N L, Liu W. A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains. Journal of Artificial Intelligence Research, 1997,(7): 199-230.
  • 4Singh S,Cohn D.How to Dynamically Merge Markov Decision Processes. In: Proceedings of Neural Information Processing Systems,NIPS 12, 1999.
  • 5Sutton R S, Barto A G. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA,1998.

同被引文献5

  • 1Boutilier C,Dearden R,Goldszmidt M.Stochastic Dynamic Programming with Factored Representations[J].Artificial Intelligence,2000,121(1/2):49-107.
  • 2Kersting K,Raedt L D.Logical Markov Decision Programs and the Convergence of TD(λ)[C]//Proceedings of the Conference on Inductive Logic Programming.NY,USA:[s.n.],2004:217-231.
  • 3Sutton R S,Barto A G Reinforcement Learning:An Introduction[M].Cambridge,UK:MIT Press,1998.
  • 4Guestrin C,Roller D,Parr R.Efficient Solution Algorithms for Factored MDPs[J].Journal of Artificial Intelligence Research,2003,19(1):399-468.
  • 5Kim K E,Dean T.Solving Factored MDPs Using Non homogeneous Partitions[J].Artificial Intelligence,2003,147(3):225-251.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部