摘要
自适应启发评价(AHC)增强学习结构分别逼近马尔可夫决策过程的值函数和策略函数,策略梯度增强学习能够将随机不确定的马尔可夫决策过程转换为确定性的马尔可夫决策过程。通过将AHC增强学习和策略梯度增强学习相结合,对PID控制器参数进行在线自适应整定,实现对无人直升机姿态控制性能的在线优化。仿真结果表明,与固定PID参数控制器相比,该算法能在线调整控制器参数,并很好地控制了无人直升机的悬停姿态。
The adaptive heuristic critic(AHC) reinforcement learning frame is approximate of the value function and the policy function of Markov decision process(MDP), the stochastic MDPs can be converted to deterministic MDPs by the policy gradient reinforcement learning. Combined the policy gradient reinforcement learning with the AHC reinforcement learning, the PID parameters was adjusted adaptively on-line, and the on-line optimization of the unmanned helicopter attitude control performance was realized. The simulation results show that this algorithm can adjust PID parameters of the controller on-line and excellently control hovering attitude of unmanned helicopter compared with the controller of fixed PID parameters.
出处
《弹箭与制导学报》
CSCD
北大核心
2008年第2期73-76,共4页
Journal of Projectiles,Rockets,Missiles and Guidance