摘要
解决拥有连续状态空间以及模型未知的倒立摆系统长久以来是个难题。文章将强化学习(Reinforcement Learning)与神经网络(ELM力,采用Actor-Critic架构,提出基于ELM-BP动作网络,根据输入的状态映射出要执行的动作,ELM数值,输出评价。同时为了降低样本空间大小提高收敛速度,引入滚动时间窗机制和适合度轨迹。经过训练和学习,能够有效解决具有连续状态空间的倒立摆系统的问题。通过Matlab软件仿真模拟倒立摆的环境进行实验,运用提出的新方法进行控制,在衡量倒立摆算法的几个指标上(尝试次数,所需时间,角度最大绝对值,位移最大绝对值等)均得到了良好的效果。
Inverted pendulum control system with model unkown and continuous staste has always been a problem to be solved.This paper combined the reinforcement learning algorithm with ELM-BP,using generalization capability a,in order to design a new learning control strategy with Actor-Critic architecture.In the face of continuous state space,BP network constitutes action network,it is responsible for mapping the state to possible actual actions.The ELM network constitutes evaluation network,it’s task is to output the evaluation function by approximating the value function.At the same time,the sliding time window mechanism is introduced to reduce the size of the sample space,and eligibility trace is to improve network convergence speed.Through the network algorithm training and systemative learning,this method can effectively solve the inverted pendulum control problem with continuous state space,at the same time,through the Matlab software simulation based on the actual control model,the results show the feasibility of the method.This research further improves the application value of reinforcement learning theory in real control system.
作者
王婷婷
WANG Ting-ting(SINOPEC Geophysical Research Institute,Nanjing 211103,China)
出处
《电子设计工程》
2019年第6期55-58,63,共5页
Electronic Design Engineering
关键词
强化学习
倒立摆
自适应启发式算法
BP
ELM神经网络
连续空间
reinforcement learning
inverted pendulum
adaptive heuristic algorithm
BP neural network
ELM neural network
continuous space