摘要
多无人车(multi-UGV)分散在军事作战任务中应用非常广泛,现有方法较为复杂,规划时间较长,且适用性不强。针对此问题,该文提出一种基于拍卖多智能体深度确定性策略梯度(AU-MADDPG)算法的多无人车分散策略。在单无人车模型的基础上,建立基于深度强化学习的多无人车分散模型。对MADDPG结构进行优化,采用拍卖算法计算总路径最短时各无人车所对应的分散点,降低分散点分配的随机性,结合MADDPG算法规划路径,提高训练效率及运行效率;优化奖励函数,考虑训练过程中及结束两个阶段,全面考虑约束,将多约束问题转化为奖励函数设计问题,实现奖励函数最大化。仿真结果表明:与传统MADDPG算法相比,所提算法在训练时间上缩短了3.96%,路径总长度减少14.50%,解决分散问题时更为有效,可作为此类问题的通用解决方案。
Multiple Unmanned Ground Vehicle(multi-UGV)dispersion is commonly used in military combat missions.The existing conventional methods of dispersion are complex,long time-consuming,and have limited applicability.To address these problems,a multi-UGV dispersion strategy is proposed based on the AUction Multi-Agent Deep Deterministic Policy Gradient(AU-MADDPG)algorithm.Founded on the single unmanned vehicle model,the multi-UGV dispersion model is established based on deep reinforcement learning.Then,the MADDPG structure is optimized,and the auction algorithm is used to calculate the dispersion points corresponding to each unmanned vehicle when the absolute path is shortest to reduce the randomness of dispersion points allocation.Plan the path according to the MADDPG algorithm to improve training efficiency and running efficiency.The reward function is optimized by taking into account both during and the end of training process to consider the constraints comprehensively.The multi-constraint problem is converted into the reward function design problem to realize maximization of the reward f unction.The simulation results show that,compared with the traditional MADDPG algorithms,the proposed algorithm has a 3.96%reduction in training time-consuming and a 14.5%reduction in total path length,which is more effective in solving the decentralized problems,and can be used as a general solution for dispersion problems.
作者
郭宏达
娄静涛
杨珍珍
徐友春
GUO Hongda;LOU Jingtao;YANG Zhenzhen;XU Youchun(Army Military Transportation University,Tianjin 300161,China)
出处
《电子与信息学报》
EI
CAS
CSCD
北大核心
2024年第1期287-298,共12页
Journal of Electronics & Information Technology
关键词
路径规划
深度强化学习
多无人车
分散策略
拍卖算法
Path planning
Deep reinforcement learning
Multi-UGVs
Dispersion strategy
Auction algorithm