摘要
无人机的集群化应用技术是近年来的研究热点,随着无人机自主智能的不断提高,无人机集群技术必将成为未来无人机发展的主要趋势之一。针对无人机集群协同执行对敌方来袭目标的追击任务,构建了典型的任务场景,基于深度确定性策略梯度网络(DDPG)算法,设计了一种引导型回报函数有效解决了深度强化学习在长周期任务下的稀疏回报问题,通过引入基于滑动平均值的软更新策略减少了DDPG算法中Eval网络和Target网络在训练过程中的参数震荡,提高了算法的训练效率。仿真结果表明,训练完成后的无人机集群能够较好地执行对敌方来袭目标的追击任务,任务成功率达到95%。可以说无人机集群技术作为一种全新概念的作战模式在军事领域具有潜在的应用价值,人工智能算法在无人机集群的自主决策智能化发展方向上具有一定的应用前景。
The Unmanned Aerial Vehicle(UAV)swarm technology is one of the research hotspots in recent years.With continuous advancement in autonomous intelligence of UAVs,the UAV swarm technology is bound to become one of the main trends of UAV development in the future.In view of the collaborative pursuit missions of UAV swarms against the enemy,we establish a typical task scenario,and,based on the Deep Deterministic Policy Gradient(DDPG)algorithm,design a guided reward function which effectively solves the sparse rewards problem of deep intensive learning during long-period missions.We introduce a sliding average based soft updating strategy to reduce parameter oscillations in the Eval network and the target network during the training process,thereby improving the training efficiency.The simulation results show that after training,the UAV swarm can successfully carry out the pursuit missions with a success rate of 95%.The UAV swarm technology as a brand new combat mode has a potential application value for application in the military field,and this artificial intelligence algorithm has a certain application prospect in the development of autonomous decision-making by UAV swarms.
作者
张耀中
许佳林
姚康佳
刘洁凌
ZHANG Yaozhong;Xu Jialin;YAO Kangjia;LIU Jieling(School of Electronics and Information,Northwestern Polytechnical Lniversity,Xi'an 710072,China;Xi'an North Electro-optic Science&Technology Co.Ltd.Xi'an 710043,China)
出处
《航空学报》
EI
CAS
CSCD
北大核心
2020年第10期309-321,共13页
Acta Aeronautica et Astronautica Sinica
基金
航空科学基金(2017ZC53033)。