期刊文献+

基于DP-SAMQ行为树的智能体决策模型研究 被引量:2

Research on Agent Decision Model Based on Multi-Step Q-Learning Behavior Tree
下载PDF
导出
摘要 在多智能体仿真中使用行为树进行决策具有直观、易扩展等优点,但行为树的设计过程过于复杂,人工调试时效率低下。引入Q-Learning来实现行为树的自动设计。为解决传统Q-Learning的收敛速度慢的问题,将模拟退火算法中的Metropolis准则应用到动作选择策略中,随着学习过程自适应改变次优动作的选择概率以及将动态规划思想应用到Q值更新策略。实验结果证明,基于改进的多步Q-Learning行为树的智能体决策模型具有更快的收敛速度,并且能够实现行为树的自动设计和优化。 The use of behavior tree for decision-making in multi-agent simulation is intuitive and easy to expand,but the design process of behavior tree is complex and the efficiency of manual debugging is low.The paper introduced Q-Learning to realize the automatic design of behavior tree.In order to solve the problem of slow convergence speed of traditional Q-Learning,a simulated annealing algorithm was used to improve the action selection strategy of multi-step Q-learning,which reduces the probability of non-optimal action selection,and a dynamic programming algorithm was used to update Q value function in reverse order.The experimental results show that the agent based on the improved Q-Learning behavior tree has faster decision-making speed,and can achieve automatic scheduling while reducing the use of conditional nodes,and get more reasonable behavior decision.
作者 陈妙云 王雷 丁治强 CHEN Miao-yun;WANG Lei;DING Zhi-qiang(School of Information Science and Technology,University of Science and Technology of China,Hefei Anhui 230031,China)
出处 《计算机仿真》 北大核心 2021年第2期301-307,共7页 Computer Simulation
基金 中科院创新基金(高技术项目CXJJ-17-M139) 中科院重大专项课题(KGFZD-135-18-027)。
关键词 多智能体 行为树 模拟退火 动态规划 用动态规划和模拟退火改进的多步Q学习 Multi-agent Behavior tree Simulated annealing Dynamic programming DP-SAMQ
  • 相关文献

参考文献8

二级参考文献62

共引文献52

同被引文献20

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部