交互式动态影响图及其精确求解算法被引量：1

Interactive dynamic influence diagrams and exact solution algorithm

下载PDF

导出

摘要为了表示部分可观察马尔可夫环境下,多Agent决策中各Agent之间的动态结构关系,对影响图(IDs)在结构和时间上进行扩展,形成一种能够对其他Agent建模的决策模型:交互式动态影响图(I-DIDs)。I-DIDs是不确定环境下多Agent进行序贯决策的图模型。该模型的解是在对其Agent行为概率分布的预测下提供给该Agent的最优决策,能更有效地解决多Agent的决策问题。但I-DIDs状态空间太大,Agents候选模型空间随着时间片的增加成指数级增长,使计算变得复杂。因此,提出了一种基于行为等价的最小化模型集合的方法,通过限制模型增长来缓解模型空间不断扩大的趋势,以达到简化计算的目的。在模型实例上的仿真实验结果显示了该算法的有效性。 To represent the dynamic relationship between agents in multi-agent Markov decision process with partially observable settings shared by other agents,the interactive dynamic influence diagrams（I-DIDs） were presented by extending influence diagrams（IDs） over time and structure.I-DIDs are graphical models for sequential decision making in partially observable setting shared by other agents.It may be used to compute the policy of an agent given its belief as the agent acts and observes in the setting. Exact algorithms for solving I-DIDs demand the solutions of possible models of the agents and then update all models at every time step.The space of other models grows exponentially with the number of time steps,increasing the computational complexity.Thus an exact solution of I-DIDs based on minimal sets was presented by reducing the space of other agents′ possible models and updating the selected models,thereby the computational complexity was simplified.Finally,model instances were given.The experimental results show the validity of the algorithm.

作者李波曹浪财庄进发

机构地区厦门大学信息科学与技术学院厦门东南融通系统工程有限公司解放军信息工程大学通信与信息学院

出处《解放军理工大学学报（自然科学版）》 EI 北大核心 2011年第2期119-124,共6页 Journal of PLA University of Science and Technology(Natural Science Edition)

基金国家自然科学基金资助项目(60975052)

关键词多AGENT决策交互式动态影响图行为等价最小模型更新集 multi-Agent decision interactive dynamic influence diagrams（I-DIDs） behaviorally equivalent minimal updating sets

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献8

1HOWARD R A, MATHESON J E. Influence diagrams[J]. Readings on the Principle and Application of Decision Analysis, 1984,11 (2) : 719-762.
2KOLLER D, MILCH B. Multi-agent influence diagrams for representation and solving games[J]. Games and Economic Behavior, 2003,45(1) : 181-121.
3GAL Y, PFEFFER A. A language for modeling agent's decision-making processes in games[C]. Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS' 03,2003.
4姚宏亮,王浩,张佑生,俞奎.多Agent动态影响图及其概率分布的近似方法[J].模式识别与人工智能,2007,20(4):525-532. 被引量：2
5POLICH K,GMYTR S P. Interactive dynamic influence diagrams [C]. International Conference on Autonomous Agents Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS' 07, 2007.
6GMYTRASIEWICZ P, DOSHI P. A framework for sequential planning in multi-agent settings[J]. Journal of Artificial Intelligence Research, 2005(24):49- 79.
7DOSHI P, ZENG Yi-feng, CHEN Qiong-yu. Graphical models for online solutions to interactive POMDPs [C]. International Conference on Autonomous Agents Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. New York :ACM, 2007.
8DOSHI P,ZENG Yi-feng, CHENG Qiong-yu. Graphical models for interactive POMDPs: representation and solutions[J]. Autonomous Agents and Multi-Agent Systems, 2009,18 (3) : 376-416.

二级参考文献17

1王红卫,李琛,刘会新.马尔可夫决策过程复杂性的熵测度[J].控制与决策,2004,19(9):983-987. 被引量：10
2Oliver N M, Rosario B, Pentland A P. A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22 (8): 831-843
3Boutilier C, Poole D. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations // Proc of the 13th National Conference on Artificial Intelligence. Portland, USA, 1996:1168-1175
4Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems, 2003, 13(1/2): 41-77
5Dagum P, Luby M. Approximating Probabilistic Inference Using Bayesian Networks Is NP-Hard. Artificial Intelligence, 1993, 60(1): 141-153
6Howard R A, Matheson J E. Influence Diagrams. Readings on the Principles and Applications of Decision Analysis, 1984, 11 (2) : 719-762
7Koller D, Milch B. Multi-Agent Influence Diagrams for Representing and Solving Games. Games and Economic Behavior, 2003, 45(1): 181-221
8Gal Y, Pfeffer A. A Language for Modeling Agents Decision Making Processes in Games // Proc of the 2nd International Joint Conference on Autonomous Agents and Multiagent Sys terns. Melbourne, Australia, 2003: 265-272
9Boyen X, Kollen D. Tractable Inference for Complex Stochastic Processes // Proc of the 14th Annual Conference on Uncertainty in Artificial Intelligence. Madison, USA, 1998:33-42
10Frick M, Groiie M. Deciding First-Order Properties of Locally Tree-Decomposable Graphs. Journal of the ACM, 2001, 48(6):1184-1206

共引文献1

1李波,罗键,尹华一,田乐.一种交互式动态影响图的改进算法[J].模式识别与人工智能,2011,24(4):506-513.

同被引文献9

1刘海涛,洪炳镕,乔立民,朴松昊.多智能体机器人系统分散式通信决策研究[J].机器人,2007,29(6):540-545. 被引量：5
2张迎晓,杨涛,胡波,陈光梦.基于Dec-POMDP的认知无线电网络频谱接入算法[J].信息与电子工程,2010,8(6):720-725. 被引量：3
3朱曼玲,金芝.一种服务Agent的可信性评估方法[J].软件学报,2011,22(11):2593-2609. 被引量：8
4潘颖慧,罗键,曾一锋.多Agent交互式动态影响图的建模方法[J].厦门大学学报（自然科学版）,2012,51(6):985-990. 被引量：2
5张琨,王翠荣,万聪.一种基于切比雪夫不等式的自适应阈值背景建模算法[J].计算机科学,2013,40(4):287-291. 被引量：5
6田乐,罗键,曹浪财.多Agent交互动态影响图的近似行为等价算法[J].华中科技大学学报（自然科学版）,2014,42(4):60-63. 被引量：2
7王科俊,杜同春.基于改进的主动逻辑与元认知环的机器人常识推理的研究[J].计算机应用研究,2016,33(1):35-41. 被引量：1
8潘颖慧,曾一锋.交互式动态影响图研究及其最优K模型解法[J].计算机学报,2018,41(1):28-46. 被引量：3
9郑灿彬,闻立杰,王建民.基于可扩展活动关系的过程概念漂移检测[J].计算机集成制造系统,2018,24(7):1589-1597. 被引量：6

引证文献1

1安敬民,李冠宇,张冬青,蒋伟.面向序贯决策中异常情景下交互问题处理方法[J].计算机集成制造系统,2020,26(12):3274-3282.

1田乐,罗键,曹浪财,陈志平.基于KL距离的交互式动态影响图近似算法[J].系统工程与电子技术,2013,35(1):207-211. 被引量：2
2罗键,李波,潘颖慧,尹华一,吴长庆.基于多Agent的交互式动态影响图研究、应用与展望[J].厦门大学学报（自然科学版）,2011,50(2):253-260. 被引量：1
3田乐,罗键,曹浪财.多Agent交互动态影响图的近似行为等价算法[J].华中科技大学学报（自然科学版）,2014,42(4):60-63. 被引量：2
4李波,罗键,尹华一,田乐.一种交互式动态影响图的改进算法[J].模式识别与人工智能,2011,24(4):506-513.
5李波,罗键,庄进发,尹华一.交互式动态影响图的一种近似求解算法[J].华中科技大学学报（自然科学版）,2011,39(10):64-68. 被引量：3
6刘石坚,乐晓波,邹峥.关于Petri网系统S-补相关定理的补充证明及其分析[J].系统仿真学报,2008,20(S2):1-5. 被引量：1
7鲁桂芳.基于交互式动态影响图的决策模型及算法分析[J].科技经济导刊,2016(3):3-4. 被引量：1
8田乐,曹浪财.基于lookahead的交互式动态影响图的DMU改进算法[J].系统工程与电子技术,2014,36(6):1201-1206.
9王丽丽,方贤文,张苗苗.子网行为等价的特殊网系统的同步距离[J].安徽理工大学学报（自然科学版）,2014,34(1):19-23.
10梁志荣.基于行为等价的远程程序执行认证[J].智能计算机与应用,2013,3(2):77-79.

解放军理工大学学报（自然科学版）

2011年第2期

浏览历史

内容加载中请稍等...

交互式动态影响图及其精确求解算法被引量：1

参考文献8

二级参考文献17

共引文献1

同被引文献9

引证文献1

相关作者

相关机构

相关主题

浏览历史

交互式动态影响图及其精确求解算法 被引量：1

参考文献8

二级参考文献17

共引文献1

同被引文献9

引证文献1

相关作者

相关机构

相关主题

浏览历史

交互式动态影响图及其精确求解算法被引量：1