摘要
无人机设备能够适应复杂地形,但由于电池容量等原因,无人机无法长时间执行任务。无人机与其他无人系统(无人车、无人船等)协同能够有效提升无人机的工作时间,完成既定任务,当无人机完成任务后,将无人机迅速稳定地降落至移动平台上是一项必要且具有挑战性的工作。针对降落问题,文中提出了基于矫正纠偏COACH(corrective advice communicated humans)方法的深度强化学习比例积分微分(proportional-integral-derivative,PID)方法,为无人机降落至移动平台提供了最优路径。首先在仿真环境中使用矫正纠偏框架对强化学习模型进行训练,然后在仿真环境和真实环境中,使用训练后的模型输出控制参数,最后利用输出参数获得无人机位置控制量。仿真结果和真实无人机实验表明,基于矫正纠偏COACH方法的深度强化学习PID方法优于传统控制方法,且能稳定完成在移动平台上的降落任务。
Unmanned Aerial Vehicle(UAV)is a type of robot that performs well in mapping without being affected by the terrain.However,a UAV cannot perform its tasks for long due to its small battery capacity and several other reasons.The collaboration between UAVs and other unmanned ground vehicles(UGVs)is considered a crucial solution to this concern as it can save up the time taken by UAVs effectively when completing a scheduled task.When deploying a team of UAVs and UGVs,it is both important and challenging to land a UAV on a mobile platform quickly and stably.To circumvent the UAV landing issue,this study proposes a reinforcement learning PID method based on the correction COACH method,thereby providing an optimal path for the UAV to land on a mobile platform.First,the reinforcement learning agent is trained using the rectification framework in a simulated environment.Next,the trained agent is used for output control parameters in the simulated and true environments,and subsequently,the output parameters are utilized to obtain the control variables of the UAV’s position.The simulation and real UAV experiment results show that the deep reinforcement learning PID method based on the correction COACH method is superior to the traditional control method and can accomplish the task of a stable landing on a mobile platform.
作者
张鹏鹏
魏长赟
张恺睿
欧阳勇平
ZHANG Pengpeng;WEI Changyun;ZHANG Kairui;OUYANG Yongping(College of Mechanical and Electrical Engineering,Hohai University,Changzhou 213022,China)
出处
《智能系统学报》
CSCD
北大核心
2022年第5期931-940,共10页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金项目(61703138)
中央高校基本科研业务费项目(B200202224)。
关键词
自主降落
强化学习
路径规划
COACH框架
确定性策略梯度
空地协同
无人机
最优控制
autonomous landing
reinforcement learning
path planning
COACH frame
deterministic policy gradient
air-ground cooperation
UAV
optimal control