动态影响图模型研究被引量：2

A dynamic influence diagram for dynamic decision processes

下载PDF

导出

摘要部分可观察马尔可夫决策过程在策略空间和状态空间上的计算复杂性,使求解其一个最优策略成为NP-hard难题.为此,提出一种动态影响图模型来建模不确定环境下的Agent动态决策问题.动态影响图模型以有向无环图表示系统变量之间的复杂关系.首先,动态影响图利用动态贝叶斯网络表示转移模型和观察模型以简化系统的状态空间;其次,效用函数以效用结点的形式清晰地表示出来,从而简化系统效用函数的表示;最后,通过决策结点表示系统的行为来简化系统的策略空间.通过实例从3个方面和POMDP模型进行了比较,研究的结果表明,动态影响图模型为大型的POMDP问题提供了一种简明的表示方式,最后在Robocup环境初步验证了该模型. Computational complexities in strategy space and state space make the partially observable Markov decision process （POMDP） an NP-hard problem. Therefore, in this paper, a dynamic influence diagram is proposed to model the decision-making problem with a single agent, in which a directed acyclic diagram is used to express the complex relationships between systematic variables. Firstly, a dynamic Bayesian network is used to represent the transition and observation models so as to reduce the state space of the system. Secondly, in order to reduce the representational complexity of the utility function, it is expressed in terms of utility nodes. Finally, the actions of the system are represented with decision nodes to simplify the strategy space. The dynamic influence diagram is compared with the POMDP using these three as- pects. Our research indicates that a dynamic influence diagram provides a simple way to express POMDP problems. Experiments in the Robocup environment verified the effectiveness of the proposed model.

作者俞奎王浩姚宏亮

机构地区常州纺织服装职业技术学院合肥工业大学计算机与信息学院

出处《智能系统学报》 2008年第2期159-166,共8页 CAAI Transactions on Intelligent Systems

基金国家自然科学基金资助项目(60575023,60705015) 安徽省自然科学基金资助项目(070412064)

关键词动态贝叶斯网络影响图马尔可夫决策过程部分可观察马尔可夫决策过程动态影响图 dynamic Bayesian networks influence diagrams Markov decision process partially observableMarkov decision process dynamic influence diagram

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献13

1[1]KAELBLING L P,LITTMAN M L,MOORE A W.Reinforcement learning:a survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.
2[2]POUPART P.Exploiting structure to efficiently solve large scale partially observable markov decision processes\.Toronto:University of Toronto,2005.
3[3]KAELBLING L P,LITTMAN M L,CASSANDRA A R.Planning and acting in partially observable stochastic domains[J].Artificial Intelligence,1998,101:99-134.
4[4]MICHAEL J,YISHAY M,ANDREW Y.Ng approximate planning in large POMDPs via reusable trajectories[C]// Advances in Neural Information Processing Systems.[S.l.] Cambridge:MIT Press,1999:1001-1007.
5[5]NICHOLAS R,GEOFFREY J.Gordon,sebastian thrun:finding approximate POMDP solutions through belief compression[J].J Artif Intell Res(JAIR),2005,23:1-40.
6[6]PAPADIMITRIOU C H,TSITSIKLIS J N.The complexity of Markov decision processes\.Mathematics of Operations Research,1987,12(3):441-450.
7[7]LUSENA C,GOLDSMITH J,MUNDHENK M.Nonapproximability results for partially observable Markov decision processes[J].Journal of Artificial Intelligence Research,2001,14:83-103.
8[8]DEAN T,KANAZAWA K.Probabilistic temporal reasoning[C]// National Conference on Artificial Intelligence.Washington:AAAI Press,1988,524-528.
9[9]RONALD A,HOWARD,JAMES E.Readings on the principles and applications of decision analysis[M].[S.l.]:Strategic Decision Group,1984.
10[10]BOUTILIER C,DEAN T,HANKS S.Decision-theoretic planning:structural assumptions and computational leverage[J].Journal of Artificial Intelligence Research,1999,11:1-94.

同被引文献33

1王浩.基于影响图的多Agent决策问题研究[J].合肥工业大学学报（自然科学版）,2005,28(9):1112-1116. 被引量：5
2李宏伟,陕毅,李新飞.深度数据的BMP位图显示[J].应用科技,2006,33(1):43-45. 被引量：5
3刘金兰,韩文秀,李光泉.影响图中的分离相互作用模型[J].管理工程学报,1996,10(4):229-233. 被引量：1
4张凤林,刘思峰.LZW＊：一个改进的LZW数据压缩算法[J].小型微型计算机系统,2006,27(10):1897-1899. 被引量：19
5WILLEMS F M J.Universal data compression and repetition times[J].IEEE Trans on Information Theory,1989,35(1):54-58.
6YOKOO H.Improved variations relating the Ziv-Lempedl and Welch-Type algorithms for sequential data compression[J].IEEE Trans on Information Theory,1992,38(1):73-81.
7Smith J Q. Influence diagrams for statistical modeling[J]. Annals of Statistics, 1989,17(2) :654-672.
8Shachter R D. Probabilistic inference and influence diagrams[J]. Operations Research, 1988,36 (4) : 724-741.
9Shachter R D. Evaluating influence diagrams[J]. Operations Research, 1986,34(6) :871-882.
10Howard R. Knowledge maps[J]. Management Science, 1989,35?:903-922.

引证文献2

1刘繁明,刘莎.气象传真图格式转换及压缩的实现[J].应用科技,2010,37(9):61-64. 被引量：4
2罗键,李波,潘颖慧,尹华一,吴长庆.基于多Agent的交互式动态影响图研究、应用与展望[J].厦门大学学报（自然科学版）,2011,50(2):253-260. 被引量：1

二级引证文献5

1刘繁明,孙铭雪,邢珅.气象传真图图题的识别[J].应用科技,2010,37(12):52-55. 被引量：1
2刘繁明,李芳明.基于VxWorks的气象传真接收机显控软件设计[J].应用科技,2012,39(4):11-15. 被引量：1
3徐开来,马良荔,冯泽波.基于低带宽网络的气象图像ROI压缩及交错显示算法[J].舰船电子工程,2018,38(1):100-104.
4沈红培,沈立.基于软件无线电技术的海上气象传真系统关键技术的研究与应用[J].中国科技成果,2021(7):42-44.
5鲁桂芳.基于交互式动态影响图的决策模型及算法分析[J].科技经济导刊,2016(3):3-4. 被引量：1

1张新良,石纯一.M-POMDP模型及其划分求解算法[J].清华大学学报（自然科学版）,2005,45(10):1413-1416. 被引量：3
2郑延斌,郭凌云,刘晶晶.多智能体系统分散式通信决策研究[J].计算机应用,2012,32(10):2875-2878. 被引量：3
3仵博,陈鑫,郑红燕,冯延蓬.基于非负矩阵分解更新规则的部分可观察马尔可夫决策过程信念状态空间降维算法[J].电子与信息学报,2013,35(12):2901-2907. 被引量：1
4人工智能[J].中国学术期刊文摘,2007,13(20):9-9.
5仵博,郑红燕,冯延蓬.POMDPs算法复杂度对比分析研究[J].深圳职业技术学院学报,2013,12(1):3-10.
6朱培元.在Solidworks中旋转和移动模型的几种方法[J].机械工人（冷加工）,2005(9):77-77.
7冯亚丽,刘阳,赵艳玲,佟巍.基于遗传算法的数据库多连接查询优化策略[J].佳木斯大学学报（自然科学版）,2007,25(4):506-508. 被引量：3
8温慧明,宫晓辉,焦洋.基于网格服务的半连接查询优化算法研究[J].计算机技术与发展,2012,22(9):123-126.
9冯延蓬,仵博,郑红燕,孟宪军.WSN中一种目标追踪在线节点调度算法[J].计算机工程,2012,38(11):96-99. 被引量：1
10肖国宝,严宣辉.一种动态不确定环境中机器人路径规划方法[J].计算机系统应用,2012,21(4):92-98. 被引量：5

智能系统学报

2008年第2期

浏览历史

内容加载中请稍等...

动态影响图模型研究被引量：2

参考文献13

同被引文献33

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

动态影响图模型研究 被引量：2

参考文献13

同被引文献33

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

动态影响图模型研究被引量：2