基于深度强化学习的驾驶仪参数快速整定方法

Autopilot parameter rapid tuning method based on deep reinforcement learning

下载PDF

导出

摘要针对深度强化学习方法对驾驶仪控制参数训练速度慢、奖励函数收敛性不好等问题,以三回路驾驶仪极点配置算法为核心,提出一种将三维控制参数转换为一维设计参量的智能训练方法,构建离线深度强化学习训练叠加在线多层感知器神经网络实时计算的智能控制架构,在提高深度强化学习算法的效率和奖励函数收敛性同时,确保在大范围飞行状态变化条件下控制参数的快速在线自整定。以典型再入飞行器为例,完成深度强化学习训练和神经网络部署。仿真结果表明,强化学习动作空间简化后的训练效率更高,训练得到的驾驶仪对控制指令的跟踪误差在1.2%以内。 Aiming at the problem of slow training speed and poor convergence of deep reinforcement learning method for the autopilot control parameters training,an intelligent training method that converts three-dimensional control parameters into one-dimensional design parameters is proposed with the three-loop autopilot pole placement method as the core.The intelligent control architecture of offline deep reinforcement learning training and online multi-layer perceptron neural network real-time calculation is constructed,which improves the efficiency and convergence of deep reinforcement learning algorithm and ensures the rapid online tuning of control parameters under the condition of large-scale flight state changes.Taking a typical reentry aircraft as an example,the deep reinforcement learning training and neural network deployment are accomplished.The simulation results show that the training efficiency of the simplified reinforcement learning action space is higher,and the tracking error of the controller to the control command is less than 1.2%by the proposed parameter rapid tuning method based on deep reinforcement learning.

作者万齐天卢宝刚赵雅心温求遒 WAN Qitian;LU Baogang;ZHAO Yaxin;WEN Qiuqiu(School of Aerospace Engineering,Beijing Institute of Technology,Beijing 100081,China;Beijing Institute of Space Long March Vehicle,Beijing 100076,China;China Academy of Launch Vehicle Technology,Beijing 100076,China)

机构地区北京理工大学宇航学院北京航天长征飞行器研究所中国运载火箭技术研究院

出处《系统工程与电子技术》 EI CSCD 北大核心 2022年第10期3190-3199,共10页 Systems Engineering and Electronics

基金航空科学基金(202037012003)资助课题。

关键词强化学习自动驾驶仪参数整定智能控制归一化 reinforcement learning autopilot parameter tuning intelligent control normalization

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1温求遒,夏群力,祁载康.三回路驾驶仪开环穿越频率约束极点配置设计[J].系统工程与电子技术,2009,31(2):420-423. 被引量：15
2孙宝彩,祁载康.带状态反馈约束的驾驶仪极点配置设计方法[J].系统仿真学报,2006,18(z2):892-893. 被引量：2
3朱敬举,祁载康,夏群力.三回路驾驶仪的极点配置方法设计[J].弹箭与制导学报,2007,27(4):8-12. 被引量：12
4王辉,林德福,祁载康.导弹伪攻角反馈三回路驾驶仪设计分析[J].系统工程与电子技术,2012,34(1):129-135. 被引量：25
5ZENG Xin,ZHU Yanwei,YANG Leping,ZHANG Chengming.A guidance method for coplanar orbital interception based on reinforcement learning[J].Journal of Systems Engineering and Electronics,2021,32(4):927-938. 被引量：3
6LI Yue,QIU Xiaohui,LIU Xiaodong,XIA Qunli.Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs[J].Journal of Systems Engineering and Electronics,2020,31(4):734-742. 被引量：12
7MA Ye,CHANG Tianqing,FAN Wenhui.A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning[J].Journal of Systems Engineering and Electronics,2021,32(3):642-657. 被引量：3
8Min Fang,Frans C.A. Groen.Collaborative multi-agent reinforcement learning based on experience propagation[J].Journal of Systems Engineering and Electronics,2013,24(4):683-689. 被引量：5
9南杨,李中健,叶文伟.基于强化学习的飞行自动驾驶仪设计[J].电子设计工程,2013,21(10):45-47. 被引量：3
10范军芳,张鑫.基于强化学习的微小型弹药两回路驾驶仪设计[J].战术导弹技术,2019,0(4):48-54. 被引量：2

二级参考文献64

1陈奎兆,王江云.飞行仿真器自动飞行系统研究[J].系统仿真学报,2006,18(z2):706-709. 被引量：11
2王娟利,祁载康.带阻尼约束的驾驶仪最速响应鲁棒性设计[J].系统仿真学报,2006,18(z2):888-891. 被引量：1
3李林静,刘永善.基于自适应控制理论的自动驾驶仪设计[J].战术导弹控制技术,2004(3):13-16. 被引量：5
4李钟慎.新型高阶Butterworth最佳传递函数[J].华侨大学学报（自然科学版）,2006,27(2):174-176. 被引量：6
5周浦城,洪炳镕,黄庆成.一种新颖的多agent强化学习方法[J].电子学报,2006,34(8):1488-1491. 被引量：8
6Garnell P. Guided weapon control systems[M]. Second Revision by Qi zai-kang , Xia qun-li. Beijing: Beijing Institute of Technology, 2004.
7Zarchan P. Tactical and strategic missile guidance[M]. Washington D C, American institute of Aeronautics and Astronautics, 1994.
8蔡林留.飞弹导引控制系统[M].中国台北:天公书局印行,1989.
9Curtis P, Mraeek D, Brett R. Missile longitudinal autopilots: Comparison of multiple three loop topotogies[J]. AIAA Guidance , Navigation , and Control Conference and Exhibit, ArAA 2005 - 6380.
10Curtis P, Mracek D, Brett R. Missile longitudinal autopilots.. Connections between optimal control and classical topologies[J]. AIAA Guidance, Navigation and Control Conference and Eyhibit, AIAA 2005 - 6381.

共引文献61

1林继.基于智能控制的PID控制方式的分析[J].冶金管理,2020(23):67-68. 被引量：3
2郭雯雯,周军.空空导弹双通道控制的多变量频域设计方法[J].火力与指挥控制,2009,34(12):68-70. 被引量：2
3常超,林德福,廉培刚.战术导弹GPS制导控制系统设计[J].弹箭与制导学报,2010,30(1):10-12.
4刁兆师,单家元.基于预测校正的三回路驾驶仪极点配置设计[J].系统工程与电子技术,2012,34(8):1668-1674. 被引量：5
5何镜,夏群利,孙静,刘大卫,孙旭光.两回路驾驶仪加速度计杠杆效应研究[J].兵工学报,2012,33(8):962-967. 被引量：2
6郑鹍鹏,华建林,姜殿民.三回路驾驶仪控制下的导弹静不稳定性边界[J].四川兵工学报,2013,34(5):27-30. 被引量：5
7吕飞,郑鹍鹏.舵机反操纵对于自动驾驶仪稳定性的影响[J].四川兵工学报,2013,34(6):8-10. 被引量：5
8李友年,郑鹍鹏,陈星阳.三回路过载驾驶仪的快速性极限分析[J].弹箭与制导学报,2013,33(3):17-20. 被引量：5
9王嘉鑫,林德福,祁载康.战术导弹三回路过载驾驶仪时频特性分析[J].兵工学报,2013,34(7):828-834. 被引量：11
10吕小龙,唐胜景,郭杰,徐建辉.引入PI校正的两回路自动驾驶仪设计方法[J].战术导弹控制技术,2013(3):32-36.

1王雨,崔茅,张公平,唐炜.气动伺服弹性系统的自适应陷波器算法设计[J].航空科学技术,2020,31(3):73-78. 被引量：1
2李俊贤,范军芳.考虑目标主动防御的空地弹药微分对策制导[J].电光与控制,2022,29(9):22-26. 被引量：1
3汤福南,张可,竺明月,杨春花,张晖,汪缨,袁冬青.基于多层感知器神经网络的小切口角膜基质透镜取出手术辅助诊断研究[J].中国医学装备,2022,19(9):1-5.
4田陆川,姜红,陈坦之,高永照,李春雷,屈音璇,刘峰.差分拉曼光谱结合XRF对塑料药瓶的多元分类研究[J].包装工程,2022,43(17):59-65. 被引量：1
5周志明,刘振,易建强,姚晓先.舵机动态响应下导弹阻尼回路稳定性[J].系统工程与电子技术,2022,44(10):3200-3206.
6杨登峰,钟扬威,吴翔,吴凡,史祥鹏,张培喜.一种线性自抗扰三回路姿态驾驶仪设计方法研究[J].战术导弹技术,2022(3):66-72. 被引量：2
7陈俊杰,郭孔辉,殷智宏,王琼瑶,张磊.囊式空气弹簧垂向刚度统一模型研究[J].机械工程学报,2022,58(12):180-187. 被引量：3
8韩维,崔凯凯,刘洁,王昕炜,张勇.基于自校正MPC的舰载机着舰控制技术[J].系统工程与电子技术,2022,44(1):250-261. 被引量：5
9吕雁辉,林针松.制鞋过程中重金属含量在线检测技术的研究[J].轻工标准与质量,2022(4):68-73.
10沈全,唐明军.一种基于IAP技术的批量升级GD32微控制器芯片程序的方法[J].电子技术（上海）,2022,51(7):15-17. 被引量：1

系统工程与电子技术

2022年第10期

浏览历史

内容加载中请稍等...

基于深度强化学习的驾驶仪参数快速整定方法

参考文献11

二级参考文献64

共引文献61

相关作者

相关机构

相关主题

浏览历史