

Reinforcement leaning research for the control of the inverted pendulum based on ELM-BP neural network
摘要 解决拥有连续状态空间以及模型未知的倒立摆系统长久以来是个难题。文章将强化学习(Reinforcement Learning)与神经网络(ELM力,采用Actor-Critic架构,提出基于ELM-BP动作网络,根据输入的状态映射出要执行的动作,ELM数值,输出评价。同时为了降低样本空间大小提高收敛速度,引入滚动时间窗机制和适合度轨迹。经过训练和学习,能够有效解决具有连续状态空间的倒立摆系统的问题。通过Matlab软件仿真模拟倒立摆的环境进行实验,运用提出的新方法进行控制,在衡量倒立摆算法的几个指标上(尝试次数,所需时间,角度最大绝对值,位移最大绝对值等)均得到了良好的效果。 Inverted pendulum control system with model unkown and continuous staste has always been a problem to be solved.This paper combined the reinforcement learning algorithm with ELM-BP,using generalization capability a,in order to design a new learning control strategy with Actor-Critic architecture.In the face of continuous state space,BP network constitutes action network,it is responsible for mapping the state to possible actual actions.The ELM network constitutes evaluation network,it’s task is to output the evaluation function by approximating the value function.At the same time,the sliding time window mechanism is introduced to reduce the size of the sample space,and eligibility trace is to improve network convergence speed.Through the network algorithm training and systemative learning,this method can effectively solve the inverted pendulum control problem with continuous state space,at the same time,through the Matlab software simulation based on the actual control model,the results show the feasibility of the method.This research further improves the application value of reinforcement learning theory in real control system.
作者 王婷婷 WANG Ting-ting(SINOPEC Geophysical Research Institute,Nanjing 211103,China)
出处 《电子设计工程》 2019年第6期55-58,63,共5页 Electronic Design Engineering
关键词 强化学习 倒立摆 自适应启发式算法 BP ELM神经网络 连续空间 reinforcement learning inverted pendulum adaptive heuristic algorithm BP neural network ELM neural network continuous space
  • 相关文献



  • 1Laura RAY.Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning[J].控制理论与应用(英文版),2011,9(3):440-450. 被引量:2
  • 2程福雁,钟国民,李友善.二级倒立摆的参变量模糊控制[J].信息与控制,1995,24(3):189-192. 被引量:33
  • 3[19]James A Highsmith.Adaptive Software Development[M].北京:清华大学出版社,2003.
  • 4Donoho D L.High dimensional data analysis : the curses andblessings of dimensionality[C]//American Mathematics SocietyConference: Math Challenges of the 21st Century, Los Angeles,USA,2000.
  • 5Zhang G P.Neural networks for classification: a survey [J].IEEE Trans on Systems,Man,and Cyberaetics-Part B,2000,30(1).
  • 6Brown D E, Corrube V, Pittard C L.A comparison of deci-sion tree classifiers with backpropagation neural networksfor multimodal classification problems[J].Pattern Recogni-tion, 1993,26:953-961.
  • 7Bruce L M, Koger C H, Li J.Dimensionality reduction ofhyperspectral data using discrete wavelet transform featureextractionfJ] .IEEE Transactions on Geoscience and RemoteSensing,2002,40( 10).
  • 8Carreira-Perpinan M A.A review of dimension reductiontechniques[R].[S.l.].University of Sheffield, 1997.
  • 9杨建刚.神经网络应用原理[M].杭州:浙江大学出版社,2001.
  • 10Yi J, Yubazaki N. Stabilization fuzzy control of invertedpendulum system [ J ]. Artficial Intelligence inEngineering,2000( 14) : 153-163.









使用帮助 返回顶部