摘要
针对无人机在复杂障碍物环境下避障的问题,提出了一种基于速度障碍-近端策略优化的避障策略。根据无人机自身信息和速度障碍法描述的障碍物信息构建状态空间,设计基于速度障碍区域同时包括速度和距离的奖惩函数。在Actor-Critic网络结构下设计近端策略优化的算法框架,通过智能体与环境交互,训练出奖励最大化下的网络参数,实现了无人机在复杂环境中的避障。通过在仿真实验中与其他算法进行各项指标参数的对比分析,证明了所提算法具有良好的泛化性和有效性。
Focusing on the problem of obstacle avoidance of UAV in complex obstacle environment,an obstacle avoidance strategy based on VO-PPO is proposed.Firstly,the state space is constructed according to the UAV's own information and the obstacle information described by the velocity obstacle method,and the reward and punishment function based on VO region including speed and distance is designed.Secondly,an algorithm framework of Proximal Policy Optimization is designed under the Actor-Critic network structure,through the interaction of agent and environment,network parameters under the condition of maximum reward are trained,and obstacle avoidance of UAV in different environments is realized.Finally,compared with other algorithms in simulation experiments,the proposed algorithm is proved to have great generalization and effectiveness.
作者
焦卫东
刘爽
张思远
JIAO Wei-dong;LIU Shuang;ZHANG Si-yuan(Civil Aviation University of China,Tianjin 300000)
出处
《航空计算技术》
2024年第3期16-19,24,共5页
Aeronautical Computing Technique
基金
国家重点基础研究发展计划项目资助(2020YFB1600101)。
关键词
深度强化学习
无人机
避障
复杂环境
deep reinforcement learning
UAV
obstacle avoiding
complex environment