基于增强学习的无人直升机姿态控制器设计被引量：1

Design of Attitude Controller for Unmanned Helicopter Based on Reinforcement Learning Algorithm

下载PDF

导出

摘要自适应启发评价(AHC)增强学习结构分别逼近马尔可夫决策过程的值函数和策略函数,策略梯度增强学习能够将随机不确定的马尔可夫决策过程转换为确定性的马尔可夫决策过程。通过将AHC增强学习和策略梯度增强学习相结合,对PID控制器参数进行在线自适应整定,实现对无人直升机姿态控制性能的在线优化。仿真结果表明,与固定PID参数控制器相比,该算法能在线调整控制器参数,并很好地控制了无人直升机的悬停姿态。 The adaptive heuristic critic（AHC） reinforcement learning frame is approximate of the value function and the policy function of Markov decision process（MDP）, the stochastic MDPs can be converted to deterministic MDPs by the policy gradient reinforcement learning. Combined the policy gradient reinforcement learning with the AHC reinforcement learning, the PID parameters was adjusted adaptively on-line, and the on-line optimization of the unmanned helicopter attitude control performance was realized. The simulation results show that this algorithm can adjust PID parameters of the controller on-line and excellently control hovering attitude of unmanned helicopter compared with the controller of fixed PID parameters.

作者蔡文澜王俊生税海涛马宏绪黄茜薇

机构地区国防科学技术大学机电工程与自动化学院

出处《弹箭与制导学报》 CSCD 北大核心 2008年第2期73-76,共4页 Journal of Projectiles,Rockets,Missiles and Guidance

关键词无人直升机增强学习自适应启发评价策略梯度 PEGASUS unmanned helicopter reinforcement learning adaptive heuristic critic policy gradient PEGASUS

分类号 V249.1 [航空宇航科学与技术—飞行器设计] TP273 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献5

1Andrew Y NG.Autonomous helicopter flight via reinforcement learning[C]// Neural Information Processing System 16,2004.
2Sutton R and A Barto.Reinforcement learning,an introduction[M].M.I.T.Press,1998.
3Williams R J.Simple statistical gradient-following algorithms for connectionist reinforcement learning[J].Machine Learning,1992,8(3):229-258.
4J Baxter,A Trigdell and L Weaver.Knightcap:A chess program that learns by combining TD(λ) with game-tree search[C]// Proc.15th International Conf.on Machine Learning,Morgan Kaufmann,San Francisco,CA,1998.
5Andrew Y.NG and M Jordan.Pegasus:A policy search method for large MDPs and POMDPs approximation[C]// Uncertainty in Artificial on Experiment Robotics,1998.

同被引文献14

1郭红霞,吴捷,王春茹.基于强化学习的模型参考自适应控制[J].控制理论与应用,2005,22(2):291-294. 被引量：5
2王学宁,陈伟,张锰,徐昕,贺汉根.增强学习中的直接策略搜索方法综述[J].智能系统学报,2007,2(1):16-24. 被引量：8
3TAMEI T, SHIBATA T. Fast reinforcement learning for three-dimensional kinetic human-robot cooperation with EMG-to-activation model [J].Advanced Robotics, 2011, 25(5): 563-580.
4HAN Y K, KIMURA H. Motions obtaining of multi- degree-freedom underwater robot by using reinforcement learning algorithms [C]// IEEE Region 10 Annual Inter- national Conference, Proceedings/TENCON. New Jersey: IEEE,2010: 1498-1502.
5PETERS J, SCHAAL S. Natural actor-critic [J]. Neu- rocompnting, 2008, 71(7/8/9) : 1180 - 1190.
6ABBEEL P. Apprenticeship learning and reinforcement learning with application to robotic control [D]. Stan- ford: Department of Computer Science, Stanford Uni- versity, 2008.
7CHU B, PARK J, HONG D. Tunnel ventilation con- troller design using an RLS-based natural actor-critic al- gorithm [J]. International Journal of Precision Engineer- ing and Manufacturing, 2010, 11 (6) : 829 - 838.
8LEWIS F L, VRABIE D. Reinforcement learning and adaptive dynamic programming for feedback control [J]. IEEE Circuits and Systems Magazine, 009, 9(3) : 32 - 50.
9LEWIS F L, VAMVOUDAKIS K G. Optimal adaptive control for unknown systems using output feedback by reinforcement learning methods [C] // Proceedings of 2010 8th IEEE International Conference on Control and Automation. New Jersey: IEEE Computer Society, 2010: 2138-2145.
10BHATNAGAR S, SUTTON R S, GHAVAMZADEH M, et al. Natural actor-critic algorithms [J].Automat- ica, 2009, 45(11): 2471-2482.

引证文献1

1郝钏钏,方舟,李平.基于参考模型的输出反馈强化学习控制[J].浙江大学学报（工学版）,2013,47(3):409-414. 被引量：1

二级引证文献1

1甄岩,袁健全,池庆玺,郝明瑞.深度强化学习方法在飞行器控制中的应用研究[J].战术导弹技术,2020(4):112-118. 被引量：3

1段勇,崔宝侠,徐心和.进化强化学习及其在机器人路径跟踪中的应用[J].控制与决策,2009,24(4):532-536. 被引量：6
2苏武荣.Pegasus　Mail电子邮件软件[J].电子科技,1998,11(3):23-25.
3Pegasus Mail 4.12[J].个人电脑,2004,10(3):86-87.
4郜园园,朱凡,宋洪军.进化操作行为学习模型及在移动机器人避障上的应用[J].计算机应用,2013,33(8):2283-2288. 被引量：3
5Pegasus引入了增强的购物交易到ShoppingNG[J].饭店现代化,2011(9):60-60.
6首架以LEAP-1A为动力的空客A320neo飞机交付[J].航空维修与工程,2016,0(8):9-9.
7范才智,宋宝泉,王建东,刘云辉.一种带有监督控制的无人直升机姿态模糊控制器[J].系统仿真学报,2010,22(6):1425-1428. 被引量：1
8王学宁,陈伟,张锰,徐昕,贺汉根.增强学习中的直接策略搜索方法综述[J].智能系统学报,2007,2(1):16-24. 被引量：8
9王辉,于婧.几种经典的策略梯度算法性能对比[J].电脑知识与技术（过刊）,2014,20(10X):6937-6941. 被引量：1
10李竞捷.增强学习在航空器运动姿态干预上的应用分析[J].科技风,2014(16):19-20.

弹箭与制导学报

2008年第2期

浏览历史

内容加载中请稍等...

基于增强学习的无人直升机姿态控制器设计被引量：1

参考文献5

同被引文献14

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于增强学习的无人直升机姿态控制器设计 被引量：1

参考文献5

同被引文献14

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于增强学习的无人直升机姿态控制器设计被引量：1