期刊文献+

基于强化学习的多无人飞行器避碰决策方法 被引量:1

A Multi-UAV Collision Avoidance Decision-MakingMethod Based on Reinforcement Learning
下载PDF
导出
摘要 随着低空空域环境的日益复杂,执行任务的无人飞行器间发生冲突的概率不断增加。针对传统强化学习算法SAC,DDPG在解决有限空域内多无人飞行器间的避碰问题上存在收敛速度慢、收敛不稳定等缺陷,提出了一种基于PPO2算法的多智能体强化学习(MARL)方法。首先,将多无人飞行器飞行决策问题描述为马尔可夫决策过程;其次,设计状态空间与奖励函数,通过最大化累计奖赏来优化策略,使整体训练更加稳定、收敛更快;最后,基于深度学习TensorFlow框架和强化学习Gym环境搭建飞行模拟场景,进行仿真实验。实验结果表明,所提方法相较于基于SAC和DDPG算法的方法,避碰成功率分别提高约37.74和49.15个百分点,能够更好地解决多无人飞行器间的避碰问题,在收敛速度和收敛稳定性方面更优。 With the increasingly complex low-altitude airspace environment,the probability of conflict among UAVs performing missions is increasing.Traditional reinforcement learning algorithms of SAC and DDPG suffer from slow convergence rate and unstable convergence in solving the problem of collision avoidance among multiple UAVs in limited airspace.To solve the problems,a Multi-Agent Reinforcement Learning(MARL)method based on PPO2 algorithm is proposed.Firstly,the multi-UAV flight decision-making problem is described as a Markov decision-making process.Secondly,the state space and reward function are designed to optimize the strategy by maximizing the cumulative reward,so that the overall training is more stable and converges faster.Finally,a flight simulation scene is built based on the deep learning framework TensorFlow and the reinforcement learning environment Gym,and simulation experiments are carried out.The experimental results show that the proposed method improves the success rate of collision avoidance by about 37.74 and 49.15 percent points respectively compared with that of the SAC and DDPG algorithms,which can better solve the problem of collision avoidance among multiple UAVs,and is better in terms of convergence rate and convergence stability.
作者 杨艳飞 诸燕平 胡灿 张斌 YANG Yanfei;ZHU Yanping;HU Can;ZHANG Bin(Changzhou University,School of Computer Science and Artificial Intelligence,Changzhou 213000,China;Changzhou University,School of Microelectronics and Control Engineering,Changzhou 213000,China)
出处 《电光与控制》 CSCD 北大核心 2023年第9期112-118,共7页 Electronics Optics & Control
基金 江苏省研究生科研创新项目(KYCX22_3053)。
关键词 无人飞行器 深度强化学习(DRL) 多智能体 避碰 PPO2 UAV Deep Reinforcement Learning(DRL) multi-agent collision avoidance PPO2
  • 相关文献

参考文献4

二级参考文献16

共引文献66

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部