摘要
部分可观察马尔可夫决策过程在策略空间和状态空间上的计算复杂性,使求解其一个最优策略成为NP-hard难题.为此,提出一种动态影响图模型来建模不确定环境下的Agent动态决策问题.动态影响图模型以有向无环图表示系统变量之间的复杂关系.首先,动态影响图利用动态贝叶斯网络表示转移模型和观察模型以简化系统的状态空间;其次,效用函数以效用结点的形式清晰地表示出来,从而简化系统效用函数的表示;最后,通过决策结点表示系统的行为来简化系统的策略空间.通过实例从3个方面和POMDP模型进行了比较,研究的结果表明,动态影响图模型为大型的POMDP问题提供了一种简明的表示方式,最后在Robocup环境初步验证了该模型.
Computational complexities in strategy space and state space make the partially observable Markov decision process (POMDP) an NP-hard problem. Therefore, in this paper, a dynamic influence diagram is proposed to model the decision-making problem with a single agent, in which a directed acyclic diagram is used to express the complex relationships between systematic variables. Firstly, a dynamic Bayesian network is used to represent the transition and observation models so as to reduce the state space of the system. Secondly, in order to reduce the representational complexity of the utility function, it is expressed in terms of utility nodes. Finally, the actions of the system are represented with decision nodes to simplify the strategy space. The dynamic influence diagram is compared with the POMDP using these three as- pects. Our research indicates that a dynamic influence diagram provides a simple way to express POMDP problems. Experiments in the Robocup environment verified the effectiveness of the proposed model.
出处
《智能系统学报》
2008年第2期159-166,共8页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金资助项目(60575023,60705015)
安徽省自然科学基金资助项目(070412064)
关键词
动态贝叶斯网络
影响图
马尔可夫决策过程
部分可观察马尔可夫决策过程
动态影响图
dynamic Bayesian networks
influence diagrams
Markov decision process
partially observableMarkov decision process
dynamic influence diagram