摘要
无人机集群系统具有能力冗余、抗毁能力强、适应复杂场景等优势,能够实现高效的任务执行和信息获取。近年来,深度强化学习技术被引入无人机集群编队控制方法中,以解决集群维度爆炸和集群系统建模困难的弊端,但深度强化学习面临训练效率低等问题。本文提出了一种基于改进近端策略优化方法的集群编队方法,通过引入动态估计法作为评价机制,解决了传统近端策略优化方法收敛速度慢和忽视高价值动作问题,有效提升了数据利用率。仿真试验证明,该方法能够提高训练效率,解决样本复用问题,具有良好的决策性能。
Unmanned aerial vehicle(UAV)cluster systems have advantages in redundancy of capabilities,high destruction resistance,and adaptability to complex scenarios,allowing more efficient mission execution and information acquisition.In recent years,deep reinforcement learning techniques have been combined into UAV cluster formation control methods to treat the drawbacks of cluster dimension explosion and difficulty in modelling cluster systems.However,deep reinforcement learning has problems such as low training efficiency.In this paper,a cluster formation method using an improved proximal policy optimization method was proposed.It could solve the slow convergence problems and neglect of high-value actions of the traditional proximal policy optimization method by using the dynamic estimation method as the evaluation mechanism,and effectively improve the data utilization rate.Simulation results verified the improvement in the training efficiency and sample reuse problems,thus achieving the optimized performance.
作者
全家乐
马先龙
沈昱恒
QUAN Jiae;MA Xianong;SHEN Yuheng(School of Astronautics,Northwestern Polytechnical University,Xi’an 710129,Shaanxi,China;Shanghai Electro-Mechanical Engineering Institute,Shanghai 201109,China)
出处
《空天防御》
2024年第2期52-62,共11页
Air & Space Defense
基金
国家自然科学基金(61473226)。
关键词
无人机集群
深度强化学习
近端策略优化
逆强化学习
集群决策
unmanned aerial vehicle clustering
deep reinforcement learning
proximal policy optimization
inverse reinforcement learning
cluster decision making