摘要
障碍规避是无人机等自主无人系统运动规划的重要环节,其核心是设计有效的避障控制方法.为了进一步提高决策优化性和控制效果,本文在最优控制的设定下,提出一种基于强化学习的自主避障控制方法,以自适应方式在线生成安全运行轨迹.首先,利用障碍函数法在代价函数中设计了一个光滑的奖惩函数,从而将避障问题转换为一个无约束的最优控制问题.然后,利用行为–评价神经网络和策略迭代法实现了自适应强化学习,其中评价网络利用状态跟随核函数逼近代价函数,行为网络给出近似最优的控制策略;同时,通过状态外推法获得模拟经验,使得评价网络能利用经验回放实现可靠的局部探索.最后,在简化的无人机系统和非线性数值系统上进行了仿真实验与方法对比,结果表明,提出的避障控制方法能实时生成较优的安全运行轨迹.
Obstacle avoidance is an important issue in the motion planning of autonomous unmanned systems.Therefore,designing an effective avoidance control method is crucial.For further improving the decision-making process,this paper presents a novel autonomous obstacle avoidance control method based on reinforcement learning that generates a safe motion trajectory in an adaptive manner.First,the barrier function is utilized to design a smooth penalty function in the cost function,thereby transforming the avoidance problem into an unconstrained optimal control problem.Then,adaptive reinforcement learning is implemented by using an actor-critic neural network architecture and policy iteration,in which the critic network uses the state-following kernel function to approximate the cost function while the actor network provides an approximate optimal control policy.During this learning process,the simulated experience is obtained through state extrapolation such that the critic network can use experience replay for reliable local exploration.Finally,simulation experiments on simplified drone systems and a nonlinear numerical system are provided.The proposed method can generate a safe motion trajectory in real time with comparable performance.
作者
王珂
穆朝絮
蔡光斌
汪韧
孙长银
Ke WANG;Chaoxu MU;Guangbin CAI;Ren WANG;Changyin SUN(School of Electrical and Information Engineering,Tianjin University,Tianjin 300072,China;College of Missile Engineering,Rocket Force University of Engineering,Xi’an 710025,China;R&D Center,China Academy of Launch Vehicle Technology,Beijing 100076,China;School of Automation,Southeast University,Nanjing 210096,China)
出处
《中国科学:信息科学》
CSCD
北大核心
2022年第9期1672-1686,共15页
Scientia Sinica(Informationis)
基金
国家重点研究发展计划(批准号:2021YFB1714700)
国家自然科学基金(批准号:62022061)资助项目。
关键词
自主无人系统
避障控制
强化学习
神经网络
经验回放
autonomous unmanned systems
obstacle avoidance control
reinforcement learning
neural networks
experience replay