期刊文献+

基于DDPG算法的无人机集群追击任务 被引量:29

Pursuit missions for UAV swarms based on DDPG algorithm
原文传递
导出
摘要 无人机的集群化应用技术是近年来的研究热点,随着无人机自主智能的不断提高,无人机集群技术必将成为未来无人机发展的主要趋势之一。针对无人机集群协同执行对敌方来袭目标的追击任务,构建了典型的任务场景,基于深度确定性策略梯度网络(DDPG)算法,设计了一种引导型回报函数有效解决了深度强化学习在长周期任务下的稀疏回报问题,通过引入基于滑动平均值的软更新策略减少了DDPG算法中Eval网络和Target网络在训练过程中的参数震荡,提高了算法的训练效率。仿真结果表明,训练完成后的无人机集群能够较好地执行对敌方来袭目标的追击任务,任务成功率达到95%。可以说无人机集群技术作为一种全新概念的作战模式在军事领域具有潜在的应用价值,人工智能算法在无人机集群的自主决策智能化发展方向上具有一定的应用前景。 The Unmanned Aerial Vehicle(UAV)swarm technology is one of the research hotspots in recent years.With continuous advancement in autonomous intelligence of UAVs,the UAV swarm technology is bound to become one of the main trends of UAV development in the future.In view of the collaborative pursuit missions of UAV swarms against the enemy,we establish a typical task scenario,and,based on the Deep Deterministic Policy Gradient(DDPG)algorithm,design a guided reward function which effectively solves the sparse rewards problem of deep intensive learning during long-period missions.We introduce a sliding average based soft updating strategy to reduce parameter oscillations in the Eval network and the target network during the training process,thereby improving the training efficiency.The simulation results show that after training,the UAV swarm can successfully carry out the pursuit missions with a success rate of 95%.The UAV swarm technology as a brand new combat mode has a potential application value for application in the military field,and this artificial intelligence algorithm has a certain application prospect in the development of autonomous decision-making by UAV swarms.
作者 张耀中 许佳林 姚康佳 刘洁凌 ZHANG Yaozhong;Xu Jialin;YAO Kangjia;LIU Jieling(School of Electronics and Information,Northwestern Polytechnical Lniversity,Xi'an 710072,China;Xi'an North Electro-optic Science&Technology Co.Ltd.Xi'an 710043,China)
出处 《航空学报》 EI CAS CSCD 北大核心 2020年第10期309-321,共13页 Acta Aeronautica et Astronautica Sinica
基金 航空科学基金(2017ZC53033)。
关键词 DDPG算法 无人机集群 任务决策 深度强化学习 稀疏回报 DDPG algorithm UAV swarms task decision deep reinforcement learning sparse rewards
  • 相关文献

参考文献4

二级参考文献23

  • 1Yamaguchi H. A cooperative hunting behavior by mobile-robot troops[J]. International Journal of Robotics Research, 1999, 18(9): 931-940.
  • 2Denzinger J, Funchs M. Experiments in learning prototypical situations for variants of the pursuit game[A]. Victor L. Proceedings of the 2nd International Conference on Multi-Agent Systems[C]. Kyoto: MIT Press, 1996. 48-55.
  • 3Osawa E. A metalevel coordination strategy for reactive cooperative planning[A]. Victor L. Proceeding of the First International Conference on Multi-Agent Systems[C]. Menlo Park, California: MIT Press, 1995. 297-303.
  • 4Caroline C, Craig B. The dynamics of reinforcement learning in cooperative multiagent systems[A]. Proceeding of the Fifteenth National Conference on Artificial Intelligence[C]. Madison: AAAI/MIT Press, 1998. 746-752.
  • 5Maja J M. Reinforcement learning in the multi-robot domain[J]. Autonomous Robots, 1997, 4: 73-83.
  • 6Danil V, Donald C. Adaptive critic designs[J]. IEEE Transactions on Neural Networks, 1997, 8(5): 997-1007.
  • 7Hu Junling, Wellman M P. Online learning about other agents in a dynamic multiagent system[Z]. The Second International Conference on Autonomous Agents, Minneapolis, MN, 1998.
  • 8Carreras M, Batlle J, Ridao P. Hybrid coordination of reinforcement learning-based behaviors for AUV control[Z]. IEEE/RSJ IROS, Hawaii, 2001.
  • 9陈军,高晓光.机群协同空战中的指控系统建模与分析[J].计算机工程与应用,2009,45(10):195-198. 被引量:8
  • 10刘金星.空战指挥控制的自主决策思维属性[J].电光与控制,2010,17(6):1-4. 被引量:6

共引文献199

同被引文献303

引证文献29

二级引证文献151

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部