期刊文献+

基于策略迭代和值迭代的POMDP算法 被引量:7

A Policy-and Value-Iteration Algorithm for POMDP
下载PDF
导出
摘要 部分可观察Markov决策过程是通过引入信念状态空间将非Markov链问题转化为Markov链问题来求解,其描述真实世界的特性使它成为研究随机决策过程的重要分支.介绍了部分可观察Markov决策过程的基本原理和决策过程,提出一种基于策略迭代和值迭代的部分可观察Markov决策算法,该算法利用线性规划和动态规划的思想,解决当信念状态空间较大时出现的"维数灾"问题,得到Markov决策的逼近最优解.实验数据表明该算法是可行的和有效的. Partially observable Markov decision processes (POMDP) changes the non-Markovian into Markovian over the belief state space. It has been an important branch of stochastic decision processes for its characteristics of describing the real world. Tradional algorithms to solve POMPDs are value iteration algorithm and policy iteration algorithm. However, the complexity of exact solution algorithms for such POMDPs are typically computationally intractable for all but the smallest problems. At first, the authors describe the principles and decision processes of POMDP, and then present a policy- and valueiteration algorithm (PVIA) for partially observable Markov decision processes. This algorithm uses advantages of policy iteration and value iteration when programming makes use of policy iteration and when computing uses value iteration. This algorithm using linear programming and dynamic programming resolves curse of dimensionality problem when the belief state is large, and obtains the approximate optimal value. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed effciently together. The algorithm was implemented in the SZPT_Roc team, which took the 2nd place in the simulation league of the RoboCup 2006 Chinese Open Championship. Finally, compared with some typical algorithms, experimental results show that the algorithm is practical and feasible.
出处 《计算机研究与发展》 EI CSCD 北大核心 2008年第10期1763-1768,共6页 Journal of Computer Research and Development
关键词 部分可观察Markov决策 决策算法 智能体 值迭代 策略迭代 POMDP decision algorithm agent value iteration policy iteration
  • 相关文献

参考文献10

  • 1Stephen M Majercik, Michael L Littman. Contingent planning under uncertainty via stochastic satisfiability [J]. Artificial Intelligence, 2003, 147(1-2): 119-162
  • 2刘客.实用马尔可夫决策过程[M].北京:清华大学出版社,2004
  • 3Alexander L Strehl, Michael L Littman. An empirical evaluation of interval estimation for Markov decision processes [C] //Proc of the 16th IEEE Int on Tools with Artificial Intelligence Conference. Cambridge: The MIT Press, 2004:531-539
  • 4Poupart P. Exploiting structure to efficiently solve large scale partially observable Markov decision processes[D]. Toronto: University of Toronto, 2005
  • 5Sebastien Paquet, Ludovic Tobin, Brahim Chaibdraa. An online POMDP algorithm for complex multi agent environment [C] //Proc of the 4th Int Joint Conf on Autonomous Agents and multi Agent Systems (AAMAS 05). Netherlands: Utrecht University, 2005 : 970-977
  • 6Haakan L S Younes, Michael L Littman, David Weissman, et al. The first probabilistic track of the international planning competition[J]. Journal of Artificial Intelligence Research, 2005, 24:851-887
  • 7Pineau J, Gordon G, Thrun S. Point based value iteration: An anytime algorithm for POMDPs [C] //Proc of the Int Joint Conf on Artificial Intelligence (IJCAI). Mexico: Acapulco, 2003: 1025-1032
  • 8David V Pynadath, Milind Tambe. The communicative multiagent team decision problem: Analyzing teamwork theories and models [J]. Journal of Artificial Intelligence Research, 2002, 16: 389-423
  • 9Daniel S Bernstein, Eric A Hansen, Shlomo Zilberstein. Bounded policy iteration for decentralized POMDPs [C] // Proc of the 19th Int Joint Conf on Artificial Intelligence (IJCAI-05). Menlo Park, CA:AAAI Press, 2005: 1287- 1292
  • 10Claudia Goldman, Shlomo Zilberstein. Decentralized control of cooperative systems: Categorization and complexity analysis [J]. Journal of Artificial Intelligence Research, 2004, 22:143-174

同被引文献104

引证文献7

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部