摘要
针对强化学习(RL)应用于无人机自主控制中学习效率低的问题,结合示教学习利用专家经验对其进行改进,提出基于示教知识辅助的无人机RL控制算法。通过设立示教目标函数、修正值函数,将专家经验作为监督信号引入到策略更新中,实现专家经验对基于RL的无人机自主控制系统优化过程的引导,同时,设置专家经验样本缓存库,利用经验优先回放机制赋予经验样本不同的利用率,提高数据的使用效率。仿真结果表明:与普通的无人机RL控制器相比,所提算法能够在训练初期快速获得奖励值,整个学习过程中获得的奖励值更高,学习到的控制策略的响应速度更快、准确性更高。示教知识的加入有效引导了算法的学习,提高了无人机自主控制系统的学习效率,同时,能够提高算法的性能,有利于学习到更好的控制策略。此外,示教知识的加入扩大了经验数据的种类,有利于促进算法的稳定性,使无人机自主控制系统对奖励函数的设置具有鲁棒性。
The practical application of reinforcement learning(RL)in an unmanned aerial vehicle control is restricted by low learning efficiency.An algorithm integrating RL with imitation learning was proposed to improve the performance of autonomous flight control systems.By establishing new loss and value functions,demonstrations were included as supervisory signals to actor and critic networks updating.Two replay buffers were utilized to store demonstration data and the data generated by interacting with the environment respectively.The prioritized experience replay system enhances the use of high-quality data and may assess the ratio of experience data utilization while learning.Simulation results showed that the RL control algorithm with demonstrations quickly obtained high rewards in the early stage of training and it had higher rewards during the whole training process than the conventional RL algorithm.The control strategy obtained by the proposed algorithm had faster response speed and higher control precision.Demonstrations enhance both the performance of the algorithm and the learning efficiency of the unmanned aerial vehicle autonomous control system,which makes it easier to learn more effective control techniques.The addition of demonstrations expands experience data,and increases the stability of the algorithm,making the unmanned aerial vehicle autonomous control system robust to the setting of the reward function.
作者
孙丹
高东
郑建华
韩鹏
SUN Dan;GAO Dong;ZHENG Jianhua;HAN Peng(National Space Science Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《北京航空航天大学学报》
EI
CAS
CSCD
北大核心
2023年第6期1424-1433,共10页
Journal of Beijing University of Aeronautics and Astronautics
基金
北京市科技计划(Z191100004319004)。
关键词
强化学习
专家示教
无人机
自主控制
学习系统
reinforcement learning
demonstrations
unmanned aerial vehicle
autonomous control
learning systems