期刊文献+

动态影响图模型研究 被引量:2

A dynamic influence diagram for dynamic decision processes
下载PDF
导出
摘要 部分可观察马尔可夫决策过程在策略空间和状态空间上的计算复杂性,使求解其一个最优策略成为NP-hard难题.为此,提出一种动态影响图模型来建模不确定环境下的Agent动态决策问题.动态影响图模型以有向无环图表示系统变量之间的复杂关系.首先,动态影响图利用动态贝叶斯网络表示转移模型和观察模型以简化系统的状态空间;其次,效用函数以效用结点的形式清晰地表示出来,从而简化系统效用函数的表示;最后,通过决策结点表示系统的行为来简化系统的策略空间.通过实例从3个方面和POMDP模型进行了比较,研究的结果表明,动态影响图模型为大型的POMDP问题提供了一种简明的表示方式,最后在Robocup环境初步验证了该模型. Computational complexities in strategy space and state space make the partially observable Markov decision process (POMDP) an NP-hard problem. Therefore, in this paper, a dynamic influence diagram is proposed to model the decision-making problem with a single agent, in which a directed acyclic diagram is used to express the complex relationships between systematic variables. Firstly, a dynamic Bayesian network is used to represent the transition and observation models so as to reduce the state space of the system. Secondly, in order to reduce the representational complexity of the utility function, it is expressed in terms of utility nodes. Finally, the actions of the system are represented with decision nodes to simplify the strategy space. The dynamic influence diagram is compared with the POMDP using these three as- pects. Our research indicates that a dynamic influence diagram provides a simple way to express POMDP problems. Experiments in the Robocup environment verified the effectiveness of the proposed model.
出处 《智能系统学报》 2008年第2期159-166,共8页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金资助项目(60575023,60705015) 安徽省自然科学基金资助项目(070412064)
关键词 动态贝叶斯网络 影响图 马尔可夫决策过程 部分可观察马尔可夫决策过程 动态影响图 dynamic Bayesian networks influence diagrams Markov decision process partially observableMarkov decision process dynamic influence diagram
  • 相关文献

参考文献13

  • 1[1]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:a survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.
  • 2[2]POUPART P.Exploiting structure to efficiently solve large scale partially observable markov decision processes\.Toronto:University of Toronto,2005.
  • 3[3]KAELBLING L P,LITTMAN M L,CASSANDRA A R.Planning and acting in partially observable stochastic domains[J].Artificial Intelligence,1998,101:99-134.
  • 4[4]MICHAEL J,YISHAY M,ANDREW Y.Ng approximate planning in large POMDPs via reusable trajectories[C]// Advances in Neural Information Processing Systems.[S.l.] Cambridge:MIT Press,1999:1001-1007.
  • 5[5]NICHOLAS R,GEOFFREY J.Gordon,sebastian thrun:finding approximate POMDP solutions through belief compression[J].J Artif Intell Res(JAIR),2005,23:1-40.
  • 6[6]PAPADIMITRIOU C H,TSITSIKLIS J N.The complexity of Markov decision processes\.Mathematics of Operations Research,1987,12(3):441-450.
  • 7[7]LUSENA C,GOLDSMITH J,MUNDHENK M.Nonapproximability results for partially observable Markov decision processes[J].Journal of Artificial Intelligence Research,2001,14:83-103.
  • 8[8]DEAN T,KANAZAWA K.Probabilistic temporal reasoning[C]// National Conference on Artificial Intelligence.Washington:AAAI Press,1988,524-528.
  • 9[9]RONALD A,HOWARD,JAMES E.Readings on the principles and applications of decision analysis[M].[S.l.]:Strategic Decision Group,1984.
  • 10[10]BOUTILIER C,DEAN T,HANKS S.Decision-theoretic planning:structural assumptions and computational leverage[J].Journal of Artificial Intelligence Research,1999,11:1-94.

同被引文献33

  • 1王浩.基于影响图的多Agent决策问题研究[J].合肥工业大学学报(自然科学版),2005,28(9):1112-1116. 被引量:5
  • 2李宏伟,陕毅,李新飞.深度数据的BMP位图显示[J].应用科技,2006,33(1):43-45. 被引量:5
  • 3刘金兰,韩文秀,李光泉.影响图中的分离相互作用模型[J].管理工程学报,1996,10(4):229-233. 被引量:1
  • 4张凤林,刘思峰.LZW*:一个改进的LZW数据压缩算法[J].小型微型计算机系统,2006,27(10):1897-1899. 被引量:19
  • 5WILLEMS F M J.Universal data compression and repetition times[J].IEEE Trans on Information Theory,1989,35(1):54-58.
  • 6YOKOO H.Improved variations relating the Ziv-Lempedl and Welch-Type algorithms for sequential data compression[J].IEEE Trans on Information Theory,1992,38(1):73-81.
  • 7Smith J Q. Influence diagrams for statistical modeling[J]. Annals of Statistics, 1989,17(2) :654-672.
  • 8Shachter R D. Probabilistic inference and influence diagrams[J]. Operations Research, 1988,36 (4) : 724-741.
  • 9Shachter R D. Evaluating influence diagrams[J]. Operations Research, 1986,34(6) :871-882.
  • 10Howard R. Knowledge maps[J]. Management Science, 1989,35?:903-922.

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部