摘要
自主式无人水下航行器(AUV)的动力学模型是一个多输入多输出、欠驱动强耦合的非线性系统,同时AUV的工作环境复杂多变,因此,对其姿态进行高精度控制是一个挑战。针对上述问题,本文提出了一个基于强化学习补偿器的AUV姿态控制方法,通过对历史经验数据的学习,实现在野外航行过程中抵抗未建模的不确定扰动和逐步提升姿态控制性能。主要贡献:(1)融合经典控制器和强化学习补偿器,通过经典控制器保障强化学习补偿器在训练过程中的系统稳定,通过训练好的强化学习补偿器抵抗不确定扰动和提升最终性能;(2)改进了传统的二次型的强化学习奖励函数,提升了训练速度和最终控制性能;(3)通过仿真验证了在神经网络权值随机初始化的条件下,本文设计的强化补偿控制器经过训练后可以收敛到稳定一致的性能。
The dynamic of the automatic underwater vehicle(AUV)is Multi-Input-Multi-Output(MIMO),underactuated,decoupling and nonlinear.Meanwhile,the AUV often works in uncertain environment.Therefore,it is a challenge to control the attitude for the AUV in high performance.Focused on the attitude control problem,a reinforcement-learning-compensator based method is designed in this paper.The proposed method is able to reject unknown external disturbance and keep high performance during sailing,which utilizes the historical data to train its control parameters.The main features of this paper contain:①Combine the conventional controller and the reinforcement-learning-compensator.The stability during training is guaranteed by the conventional controller,while the high-performance is implemented by the reinforcement-learning-compensator.②The common reward function is redesigned here,which speeds the training and improves the final performance.③After training,the performance of the proposed reinforcement-learning-compensator converges to a stable level with random initial weight,which is verified by simulations.
作者
彭泽华
林晓波
潘光帅
PENG Zehua;LIN Xiaobo;PAN Guangshuai(Underwater Vehicle Laboratory,Institute of Acoustics Chinese Academy of Sciences,Beijing,100190,China;University of Chinese Academy of Science,Beijing,100049,China)
出处
《网络新媒体技术》
2023年第6期36-43,共8页
Network New Media Technology
基金
中国科学院国防科技重点实验室基金项目(编号:E229150101)
国家自然科学基金(编号:61971412)。
关键词
水下航行器
强化学习
姿态控制
运动仿真
神经网络
AUV
reinforcement learning
attitude control
motion simulation
neural network