摘要
为了表示部分可观察马尔可夫环境下,多Agent决策中各Agent之间的动态结构关系,对影响图(IDs)在结构和时间上进行扩展,形成一种能够对其他Agent建模的决策模型:交互式动态影响图(I-DIDs)。I-DIDs是不确定环境下多Agent进行序贯决策的图模型。该模型的解是在对其Agent行为概率分布的预测下提供给该Agent的最优决策,能更有效地解决多Agent的决策问题。但I-DIDs状态空间太大,Agents候选模型空间随着时间片的增加成指数级增长,使计算变得复杂。因此,提出了一种基于行为等价的最小化模型集合的方法,通过限制模型增长来缓解模型空间不断扩大的趋势,以达到简化计算的目的。在模型实例上的仿真实验结果显示了该算法的有效性。
To represent the dynamic relationship between agents in multi-agent Markov decision process with partially observable settings shared by other agents,the interactive dynamic influence diagrams(I-DIDs) were presented by extending influence diagrams(IDs) over time and structure.I-DIDs are graphical models for sequential decision making in partially observable setting shared by other agents.It may be used to compute the policy of an agent given its belief as the agent acts and observes in the setting. Exact algorithms for solving I-DIDs demand the solutions of possible models of the agents and then update all models at every time step.The space of other models grows exponentially with the number of time steps,increasing the computational complexity.Thus an exact solution of I-DIDs based on minimal sets was presented by reducing the space of other agents′ possible models and updating the selected models,thereby the computational complexity was simplified.Finally,model instances were given.The experimental results show the validity of the algorithm.
出处
《解放军理工大学学报(自然科学版)》
EI
北大核心
2011年第2期119-124,共6页
Journal of PLA University of Science and Technology(Natural Science Edition)
基金
国家自然科学基金资助项目(60975052)
关键词
多AGENT决策
交互式动态影响图
行为等价
最小模型更新集
multi-Agent decision
interactive dynamic influence diagrams(I-DIDs)
behaviorally equivalent
minimal updating sets