基于深度Q网络的多智能体逃逸算法设计

Multi-Agent Evasion Algorithm Design Based on Deep Q-Network

下载PDF

导出

摘要当前多智能体追逃博弈问题通常在二维平面下展开研究,且逃逸方智能体运动不受约束,同时传统方法在缺乏准确模型时存在设计控制策略困难的问题。针对三维空间中逃逸方智能体运动受约束的情况,提出了一种基于深度Q网络(DQN)的多智能体逃逸算法。该算法采用分布式学习的方法,逃逸方智能体通过对环境的探索学习得到满足期望的逃逸策略。为提高学习效率,根据任务的难易程度将智能体策略学习划分为两个阶段,并设计了相应的奖励函数引导智能体探索满足期望的逃逸策略。仿真结果表明,该算法所得逃逸策略效果稳定,并且具有泛化能力,在改变一定的初始位置条件后,逃逸方智能体也可成功逃逸。 At present, the problem of multi-agent pursuit-evasion game is usually studied in the two-dimensional plane, and the movement of the evader is not constrained. At the same time, one problem is that it is difficult for traditional methods to design control strategy without accurate model. Therefore, this paper proposes a multi-agent evasion algorithm based on deep Q-network when the motion of evader is constrained in three-dimensional space. The proposed algorithm is a decentralized algorithm, and the evader obtains the desired evasive strategy by exploring and learning the environment. In order to improve the learning efficiency, the agent strategy learning is divided into two stages according to the difficulty of the task, and the corresponding reward function is designed to guide the agent to explore the desired evasive strategy. The simulation results show that the effect of the evasive strategy obtained by the algorithm is stable, and the algorithm has generalization ability, and the evader can successfully evade after changing certain initial position conditions.

作者闫博为杜润乐班晓军周荻 YAN Bo-wei;DU Run-le;BAN Xiao-jun;ZHOU Di(The School of Astronautics,University of the Harbin Institute of Technology,Harbin 150000,China;National Key Laboratory of Science and Technology on Test Physics and Numerical Mathematics,Beijing 100076,China)

机构地区哈尔滨工业大学航天学院试验物理与计算数学国家级重点实验室

出处《导航定位与授时》 CSCD 2022年第6期40-47,共8页 Navigation Positioning and Timing

关键词逃逸算法深度强化学习多智能体深度Q网络 Evasion algorithm Deep reinforcement learning Multi-agent Deep Q-Network

分类号 V448 [航空宇航科学与技术—飞行器设计]

引文网络
相关文献

参考文献4

1段勇,徐心和.基于多智能体强化学习的多机器人协作策略研究[J].系统工程理论与实践,2014,34(5):1305-1310. 被引量：22
2曹雷.基于深度强化学习的智能博弈对抗关键技术[J].指挥信息系统与技术,2019,10(5):1-7. 被引量：44
3谭浪,巩庆海,王会霞.基于深度强化学习的追逃博弈算法[J].航天控制,2018,36(6):3-8. 被引量：12
4刘强,姜峰.基于深度强化学习的群体对抗策略研究[J].智能计算机与应用,2020,10(5):291-296. 被引量：2

二级参考文献21

1张克,刘永才,关世义.多智能体系统在导弹攻防对抗仿真中应用的可行性研究[J].战术导弹技术,2001(6):59-65. 被引量：8
2赵秀娜,袁泉,马宏绪,黄茜薇.机动弹头中段突防姿态的搜索算法研究[J].航天控制,2007,25(4):13-16. 被引量：3
3Lucian B,Robert B,Bart D S.A comprehensive survey of multiagent reinforcement learning[J].IEEE Transactions on Systems,Man,and Cybernetics-Part C:Applications and Reviews,2008,38(2):156-172.
4Shivaram K,Yaxin L,Peter S.Half field offense in robocup soccer:A multiagent reinforcement learning case study[J].Lecture Notes in Computer Science,2007,4434:72-85.
5Littman M L.Markov games as a framework for multiagent learning[C]//Proceedings of the 11th International Conference on Machine Learning,1994:157-163.
6Hu J L,Wellman M P.Nash Q-learning for general-sum stochastic games[J].Journal of Machine Learning Research,2003,4(6):1039-1069.
7Wunder M,Littman M,Babes M.Classes of multiagent Q-learning dynamics with ε-greedy exploration[R].New Jersey:Rutgers University DCS-tr-670,2010.
8Kim H E,Ahn H S.Convergence of multiagent Q-learning:Multi action replay process approach[C]//Proceedings of the IEEE International Symposium on Intelligent Control Part of Multi-Conference on Systems and Control,2010:789-794.
9Jens K,Jan P.Imitation and reinforcement learning practical algorithms for motor primitives in robotics[J].IEEE Robotics and Automation,2010,17(2):55-62.
10Michelle M,Marcus G.Reinforcement learning in first person shooter games[J].IEEE Transactions on Computational Intelligence and AI in Games,2011,3(1):43-56.

共引文献76

1方俊逸,陈国良.追捕条件下旋翼无人机逃脱方法研究[J].数字制造科学,2023(2):114-119.
2徐雪松,曾智,邵红燕,杨胜杰,李想.基于个体-协同触发强化学习的多机器人行为决策方法[J].仪器仪表学报,2020(5):66-75. 被引量：11
3吴军,王丹,李健,杨丰梅.基于强化学习的危化品运输路径选择博弈分析[J].系统工程理论与实践,2015,35(2):388-393. 被引量：11
4王砚麟,赵志刚,石广田.多机器人协调吊运系统控制优化仿真[J].计算机仿真,2015,32(10):404-408. 被引量：8
5李六杏.基于Multi-Agent的分布式微电网设计与通信实现[J].嘉兴学院学报,2015,27(6):90-93.
6孙健,丁日佳,陈艳艳.M/M/c型与M/M/1型排队系统对比仿真[J].北京工业大学学报,2016,42(9):1324-1331. 被引量：7
7赵辉,刘雅喆.改进的Q学习算法在轨迹规划中的应用[J].吉林大学学报（信息科学版）,2016,34(5):697-702. 被引量：2
8赵辉,赵玉峰.一种改进的多智能体Q学习算法[J].自动化与仪器仪表,2017(4):25-27. 被引量：5
9郭宪.基于深度增强学习的智能体行为演进研究综述[J].中国新通信,2017,19(17):50-54. 被引量：4
10闫雪飞,李新明,刘东,王寿彪.基于Nash-Q的网络信息体系对抗仿真技术[J].系统工程与电子技术,2018,40(1):217-224. 被引量：7

1王辉,顾村锋,王波兰,顾龙飞,穆维民.无人集群智能逃逸控制算法与仿真[J].指挥与控制学报,2020,6(2):165-170.
2崔雅萌,王会霞,郑春胜,胡瑞光.高速飞行器追逃博弈决策技术[J].指挥与控制学报,2021,7(4):403-414. 被引量：6
3崔金玉,张爱娣,栾国栋,吕雪峰.微藻光驱固碳合成技术的发展现状与未来展望[J].合成生物学,2022,3(5):884-900. 被引量：9

导航定位与授时

2022年第6期

浏览历史

内容加载中请稍等...

基于深度Q网络的多智能体逃逸算法设计

参考文献4

二级参考文献21

共引文献76

相关作者

相关机构

相关主题

浏览历史