期刊文献+

基于深度强化学习的多目标无人机路径规划 被引量:1

Multi-target UAV Path Planning Based on Deep Reinforcement Learning
下载PDF
导出
摘要 在搜救领域中,透过程序完成半自主或自主飞行控制,无人机能够协助救难人员更好地完成救援任务。搜救任务中涉及到多个目标间的搜索,相比于单目标的搜索问题,需要更复杂的算法或是奖励重塑形式,才能改进其稀疏奖励的问题。此外,搜救任务比起一般的强化学习问题,更讲究时效性。如何利用搜救的先验知识对算法进行改进,从而提高完成任务的效率和训练时间,是机器学习应用的研究重点。针对搜救任务背景,研究了无人机在多目标问题下的路径规划问题。基于分层学习的概念对已有的深度强化学习算法进行了改进,提出了适用于多目标任务的深度强化学习算法——MTDDPG。该算法结合环境分区和奖励重塑,利用环境分区对搜救场景进行简化,从而缩短训练时间,再通过奖励重塑的方式提升任务完成的效率,提升了MTDDPG算法在多目标搜救任务上的训练速度和效率。利用程序仿真设计三个实验对算法进行验证,并基于不同的先验信息对环境进行建模实验,对比不同算法在多目标任务中的实验结果。此外,根据先验信息的完整与否,对比MTDDPG在不同先验信息完整度的场景下训练的结果,结果表明MTDDPG在多目标搜救任务上,可以有效地解决搜索问题,完成指定的搜救任务。 UAVs have been one of the hot research fields in recent years.The high mobility of UAVs can be applied in various fields.In the search and rescue field,UAVs can achieve semiautonomous or autonomous flight control through programs,which can assist rescuers to better complete rescue missions.The rescue task involves searching between multiple targets.Compared with the single-target search problem,more complex algorithms or reward reshaping forms are needed to improve the sparse reward problem.In addition,search and rescue tasks are more time sensitive than general reinforcement learning problems.How to use the prior knowledge of search and rescue to improve the algorithm,so as to improve the efficiency of task completion and training time,is the research focus of machine learning applications.In the context of search and rescue missions,this paper studies the path planning of UAVs under multi-target problems.Existing deep reinforcement learning algorithm is improved based on the concept of hierarchical learning,and a deep reinforcement learning algorithm MTDDPG suitable for multi-target tasks is proposed.The algorithm combines environmental partitioning and reward shaping.The rescue scene is simplified by using environmental partitions to shorten the training time,and then the efficiency of task completion is improved through reward shaping.The training speed and efficiency of the MTDDPG algorithm on multi-target search and rescue tasks are improved.This paper uses program simulation to design three experiments to verify the algorithm,and model the environment based on different prior information to compare the experimental results of different algorithms in multi-target tasks.In addition,according to the integrity of the prior information,comparing the results of MTDDPG training under different scenarios with different prior information integrity.The results show that MTDDPG can effectively solve the search problem and complete the specified search and rescue task in the multi-target search and rescue task.
作者 陈昱宏 高飞飞 CHEN Yuhong;GAO Feifei(Institute of Information Processing,Department of Automation,Tsinghua University,Beijing 100084,China)
出处 《无线电通信技术》 2022年第6期957-970,共14页 Radio Communications Technology
基金 国家重点研发计划(2018AAA0102401)。
关键词 多目标 稀疏奖励 分层学习 室内搜救 无人机 multi-target sparse reward hierarchical learning indoor rescue UAV
  • 相关文献

同被引文献22

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部