摘要
针对多无人艇对海上逃逸目标的围捕问题,提出一种基于多智能体强化学习的围捕算法.首先,以无人艇协同进攻为背景建立无边界围捕问题的环境和运动学模型,并针对快速性和合围性的需求给出围捕成功的判定条件;然后,基于多智能体近端策略优化(MAPPO)算法建立马尔可夫决策过程框架,结合围捕任务需求分别设计兼具伸缩性和排列不变性的状态空间,围捕距离、方位解耦的动作空间,捕获奖励与步长奖励相结合的奖励函数;最后,采用集中式训练、分布式执行的架构完成对围捕策略的训练,训练时采用课程式学习训练技巧,无人艇群共享相同的策略并独立执行动作.仿真实验表明,在无人艇起始数量不同的测试条件下,所提出方法在围捕成功率和时效性上相较于其他算法更具优势.此外,当无人艇节点损毁时,剩余无人艇仍然具备继续执行围捕任务的能力,所提出方法鲁棒性强,具有在真实环境中部署应用的潜力.
To solve the hunting problem of multi-USVs(unmanned surface vehicles)on the sea,a multi-agent reinforcement learning hunting algorithm is proposed.Firstly,the environmental and kinematic model of the boundary-free hunting problem is established based on the background of the cooperative attack of USVs,and the criteria for successful hunting are given according to the requirements of rapidity and encirclement.Then,a Markov decision process framework is established based on the multi-agent PPO(MAPPO)algorithm.The state-space with scalability and permutation invariant,an action space with decoupling of capture distance and azimuth,and a reward function combining capture reward and step reward are designed.Finally,the framework of centralized training and distributed execution is adopted to train the policy.During the training,the skills of curriculum learning are used to make the network converge quickly,and the USVs share the same strategy and execute the action independently.Simulation shows that the proposed method has more advantages than other algorithms in the hunting success rate and timeliness under different testing conditions.In addition,when some of the USVs are failed,the remaining USVs can continue the task,which proves strong robustness and potential for deployment in a real environment.
作者
夏家伟
朱旭芳
张建强
罗亚松
刘忠
XIA Jia-wei;ZHU Xu-fang;ZHANG Jian-qiang;LUO Ya-song;LIU Zhong(College of Weaponry Engineering,Naval University of Engineering,Wuhan 430033,China;College of Electronic Engineering,Naval University of Engineering,Wuhan 430033,China)
出处
《控制与决策》
EI
CSCD
北大核心
2023年第5期1438-1447,共10页
Control and Decision
基金
中国博士后科学基金项目(2016T45686)
湖北省自然科学基金项目(2018CFC865)
全军军事类研究项目(YJ2020B117)。
关键词
无人艇
多智能体
强化学习
深度学习
协同围捕
近端策略优化
USV
multi-agent
reinforcement learning
deep learning
cooperative hunting
proximal policy optimization