期刊文献+

基于多智能体强化学习的无人机集群对抗方法研究 被引量:2

Research on UAV Swarm Confrontation Method Based on Multi-agent Reinforcement Learning
原文传递
导出
摘要 针对复杂动态不确定环境下的无人机集群对抗问题,基于多智能体强化学习开展了对抗决策方法的研究。首先,基于MaCA环境构建了无人机集群对抗模型;其次,引入集中训练网络的混合架构模式,改进了传统DDPG算法,设计了面向无人机集群对抗的MADDPG算法,分别采用基于规则的对抗策略和基于DQN的对抗策略对算法进行了训练,提升了对抗算法的鲁棒性、适应性和泛化性;最后,通过搭建对抗仿真环境,验证了所设计方法的有效性和可靠性。 Aiming at the problem of UAV swarm confrontation in complex dynamic and uncertain environment,research on confrontation decision-making method based on multi-agent reinforcement learning is carried out.Firstly,the UAV swarm confrontation model is constructed based on the MaCA environment;secondly,the hybrid architecture mode of centralized training network is introduced,the traditional DDPG algorithm is improved,and the MADDPG algorithm for UAV swarm confrontation is designed,and the rule-based confrontation strategy is adopted respectively.The algorithm is trained with the DQN-based adversarial strategy,which improves the robustness,adaptability and generalization of the adversarial algorithm.Finally,the effectiveness and reliability of the designed method are verified by building an adversarial simulation environment.
作者 杨书恒 张栋 任智 唐硕 YANG Shuheng;ZHANG Dong;REN Zhi;TANG Shuo(School of Aerospace,Northwest Polytechnic University,Xi'an 710072,China;Shaanxi Key Laboratory of Aerospace Vehicle Design,Northwest Polytechnic University,Xi'an 710072,China)
出处 《无人系统技术》 2022年第5期51-62,共12页 Unmanned Systems Technology
基金 国家自然科学基金(61903301)。
关键词 无人机集群对抗 多智能体强化学习 MACA DQN算法 MADDPG算法 UAV Swarm Confrontation Multi-agent Reinforcement Learning MaCA DQN Algorithm MADDPG Algorithm
  • 相关文献

参考文献6

二级参考文献139

  • 1李茹杨,彭慧民,李仁刚,赵坤.强化学习算法与应用综述[J].计算机系统应用,2020,29(12):13-25. 被引量:45
  • 2MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-levelcontrol through deep reinforcement learning [J]. Nature, 2015,518(7540): 529 – 533.
  • 3SILVER D, HUANG A, MADDISON C, et al. Mastering the gameof Go with deep neural networks and tree search [J]. Nature, 2016,529(7587): 484 – 489.
  • 4AREL I. Deep reinforcement learning as foundation for artificialgeneral intelligence [M] //Theoretical Foundations of Artificial GeneralIntelligence. Amsterdam: Atlantis Press, 2012: 89 – 102.
  • 5TEAAURO G. TD-Gammon, a self-teaching backgammon program,achieves master-level play [J]. Neural Computation, 1994,6(2): 215 – 219.
  • 6SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge MA: MIT Press, 1998.
  • 7KEARNS M, SINGH S. Near-optimal reinforcement learning inpolynomial time [J]. Machine Learning, 2002, 49(2/3): 209 – 232.
  • 8KOCSIS L, SZEPESVARI C. Bandit based Monte-Carlo planning[C] //Proceedings of the European Conference on MachineLearning. Berlin: Springer, 2006: 282 – 293.
  • 9LITTMAN M L. Reinforcement learning improves behaviour fromevaluative feedback [J]. Nature, 2015, 521(7553): 445 – 451.
  • 10BELLMAN R. Dynamic programming and Lagrange multipliers[J]. Proceedings of the National Academy of Sciences, 1956,42(10): 767 – 769.

共引文献199

同被引文献11

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部