摘要
探索了利用深度强化学习算法训练智能体,以代替人类工程师进行火箭姿态控制器参数的离线设计方案。建立了多特征秒的火箭频域分析模型,选定了设计参数。选择深度强化学习算法中的双深度Q学习(Double Deep Q Network,DDQN)算法,通过记忆回放和时间差分迭代的方式让智能体在与环境交互过程中不断学习。设计了对应的马尔科夫决策过程模型,进行了智能体的训练和前向测试。结果说明该方法对于运载火箭姿控设计具有一定参考价值。
In this paper,the off-line design scheme of rocket attitude controller parameters using deep reinforcement learning algorithm to train an agent instead of human engineers is studied. Firstly,a multicharacteristic-second rocket frequency domain analysis model is established and the design parameters are selected. Then,the double deep Q network( DDQN) algorithm is selected as the training algorithm. The agent is allowed to continuously learn during the interaction with the environment through memory playback and time differential iteration in this algorithm. Meanwhile,the Markov decision process of the problem is designed,and the agent training and testing are implemented. The results show that the method has certain reference value for the attitude control design of the rocket.
作者
黄旭
柳嘉润
骆无意
Huang Xu;Liu Jiarun;Luo Wuyi(Beijing Aerospace Automatic Control Institution,Beijing 100854,China;National Key Laboratory of Science and Technology on Aerospace Intelligent Control,Beijing 100854,China)
出处
《航天控制》
CSCD
北大核心
2020年第4期3-8,共6页
Aerospace Control
关键词
深度强化学习
姿态控制器
频域分析
参数设计
Deep reinforcement learning
Attitude controller
Frequency-domain analysis
Parameter design