摘要
在多智能体仿真中使用行为树进行决策具有直观、易扩展等优点,但行为树的设计过程过于复杂,人工调试时效率低下。引入Q-Learning来实现行为树的自动设计。为解决传统Q-Learning的收敛速度慢的问题,将模拟退火算法中的Metropolis准则应用到动作选择策略中,随着学习过程自适应改变次优动作的选择概率以及将动态规划思想应用到Q值更新策略。实验结果证明,基于改进的多步Q-Learning行为树的智能体决策模型具有更快的收敛速度,并且能够实现行为树的自动设计和优化。
The use of behavior tree for decision-making in multi-agent simulation is intuitive and easy to expand,but the design process of behavior tree is complex and the efficiency of manual debugging is low.The paper introduced Q-Learning to realize the automatic design of behavior tree.In order to solve the problem of slow convergence speed of traditional Q-Learning,a simulated annealing algorithm was used to improve the action selection strategy of multi-step Q-learning,which reduces the probability of non-optimal action selection,and a dynamic programming algorithm was used to update Q value function in reverse order.The experimental results show that the agent based on the improved Q-Learning behavior tree has faster decision-making speed,and can achieve automatic scheduling while reducing the use of conditional nodes,and get more reasonable behavior decision.
作者
陈妙云
王雷
丁治强
CHEN Miao-yun;WANG Lei;DING Zhi-qiang(School of Information Science and Technology,University of Science and Technology of China,Hefei Anhui 230031,China)
出处
《计算机仿真》
北大核心
2021年第2期301-307,共7页
Computer Simulation
基金
中科院创新基金(高技术项目CXJJ-17-M139)
中科院重大专项课题(KGFZD-135-18-027)。