基于ELM-BP的强化学习在倒立摆控制中的研究

Reinforcement leaning research for the control of the inverted pendulum based on ELM-BP neural network

下载PDF

导出

摘要解决拥有连续状态空间以及模型未知的倒立摆系统长久以来是个难题。文章将强化学习(Reinforcement Learning)与神经网络(ELM力,采用Actor-Critic架构,提出基于ELM-BP动作网络,根据输入的状态映射出要执行的动作,ELM数值,输出评价。同时为了降低样本空间大小提高收敛速度,引入滚动时间窗机制和适合度轨迹。经过训练和学习,能够有效解决具有连续状态空间的倒立摆系统的问题。通过Matlab软件仿真模拟倒立摆的环境进行实验,运用提出的新方法进行控制,在衡量倒立摆算法的几个指标上(尝试次数,所需时间,角度最大绝对值,位移最大绝对值等)均得到了良好的效果。 Inverted pendulum control system with model unkown and continuous staste has always been a problem to be solved.This paper combined the reinforcement learning algorithm with ELM-BP,using generalization capability a,in order to design a new learning control strategy with Actor-Critic architecture.In the face of continuous state space,BP network constitutes action network,it is responsible for mapping the state to possible actual actions.The ELM network constitutes evaluation network,it’s task is to output the evaluation function by approximating the value function.At the same time,the sliding time window mechanism is introduced to reduce the size of the sample space,and eligibility trace is to improve network convergence speed.Through the network algorithm training and systemative learning,this method can effectively solve the inverted pendulum control problem with continuous state space,at the same time,through the Matlab software simulation based on the actual control model,the results show the feasibility of the method.This research further improves the application value of reinforcement learning theory in real control system.

作者王婷婷 WANG Ting-ting(SINOPEC Geophysical Research Institute,Nanjing 211103,China)

机构地区中国石油化工股份有限公司石油物探技术研究院

出处《电子设计工程》 2019年第6期55-58,63,共5页 Electronic Design Engineering

关键词强化学习倒立摆自适应启发式算法 BP ELM神经网络连续空间 reinforcement learning inverted pendulum adaptive heuristic algorithm BP neural network ELM neural network continuous space

分类号 TN99 [电子电信—信号与信息处理]

引文网络
相关文献

参考文献5

1戴源成,张文志.高仿真直线一级倒立摆模型设计[J].机械工程与自动化,2014(5):1-3. 被引量：1
2杨素梅,卢士彬.基于模糊控制器的倒立摆控制[J].伺服控制,2014(5):56-58. 被引量：1
3闫友彪,陈元琰.机器学习的主要策略综述[J].计算机应用研究,2004,21(7):4-10. 被引量：57
4马磊,张文旭,戴朝华.多机器人系统强化学习研究综述[J].西南交通大学学报,2014,49(6):1032-1044. 被引量：14
5康辉英,李明亮.基于降维BP神经网络的高维数据分类研究[J].计算机工程与应用,2013,49(20):183-187. 被引量：7

二级参考文献127

1Laura RAY.Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning[J].控制理论与应用（英文版）,2011,9(3):440-450. 被引量：2
2程福雁,钟国民,李友善.二级倒立摆的参变量模糊控制[J].信息与控制,1995,24(3):189-192. 被引量：33
3[19]James A Highsmith.Adaptive Software Development[M].北京:清华大学出版社,2003.
4Donoho D L.High dimensional data analysis : the curses andblessings of dimensionality[C]//American Mathematics SocietyConference: Math Challenges of the 21st Century, Los Angeles,USA,2000.
5Zhang G P.Neural networks for classification: a survey [J].IEEE Trans on Systems,Man,and Cyberaetics-Part B,2000,30(1).
6Brown D E, Corrube V, Pittard C L.A comparison of deci-sion tree classifiers with backpropagation neural networksfor multimodal classification problems[J].Pattern Recogni-tion, 1993,26:953-961.
7Bruce L M, Koger C H, Li J.Dimensionality reduction ofhyperspectral data using discrete wavelet transform featureextractionfJ] .IEEE Transactions on Geoscience and RemoteSensing,2002,40( 10).
8Carreira-Perpinan M A.A review of dimension reductiontechniques[R].[S.l.].University of Sheffield, 1997.
9杨建刚.神经网络应用原理[M].杭州:浙江大学出版社,2001.
10Yi J, Yubazaki N. Stabilization fuzzy control of invertedpendulum system [ J ]. Artficial Intelligence inEngineering,2000( 14) : 153-163.

共引文献75

1徐雪松,曾智,邵红燕,杨胜杰,李想.基于个体-协同触发强化学习的多机器人行为决策方法[J].仪器仪表学报,2020(5):66-75. 被引量：11
2刘美玲,甘娇娇,曾莹,王双双,周继云.基于增量学习的不平衡虚假评论处理研究[J].数据分析与知识发现,2024,8(8):85-95.
3王雪松,程玉虎,易建强,王炜强.基于Elman网络的非线性系统增强式学习控制[J].中国矿业大学学报,2006,35(5):653-657. 被引量：8
4邵平.机器学习与人脸识别方法概述[J].玉林师范学院学报,2006,27(3):164-167. 被引量：2
5刘志芳,骆志刚,杨泽凡,郭华源,肖国荣.基于智能Agent的个性化生物信息检索系统的设计[J].计算机应用与软件,2007,24(6):71-72.
6朱浩冰,郭东辉.声纹识别系统原理及其关键技术[J].计算机安全,2007(9):14-17. 被引量：15
7谷建光,张为华,王中伟,解红雨.一种基于划分聚类和模糊神经网络的机器学习方法[J].系统仿真学报,2007,19(23):5581-5586. 被引量：4
8杨凌霄,武建平.机器学习方法在人脸检测中的应用[J].计算机与数字工程,2008,36(3):9-13. 被引量：1
9谷建光,张为华,王中伟.产品概念设计阶段的案例相似性检索技术研究[J].计算机集成制造系统,2008,14(4):625-629. 被引量：2
10王静,李凡长.基于DFL的自主学习子空间的公理体系研究[J].计算机科学,2008,35(12):146-147. 被引量：1

1罗明玉.计算机网络安全评价中神经网络的应用研究[J].计算机产品与流通,2019,8(1):61-61. 被引量：2
2景辉鑫,钱伟,车凯.基于灰色ELM神经网络的短时交通流量预测[J].河南理工大学学报（自然科学版）,2019,38(2):97-102. 被引量：15
3卢惠民,薛小波,韦庆,徐晓红.基于小车倒立摆的《自动控制原理实验》教学实践[J].科技资讯,2018,16(23):164-165. 被引量：1
4邱宇宸.基于Actor-Critic强化学习的倒立摆智能控制方法[J].武汉冶金管理干部学院学报,2018,28(4):88-90. 被引量：4
5丁健生,刘文瑞,孙建伟.平面四杆机构少位置设计要求特征提取方法[J].机械传动,2018,42(12):59-63. 被引量：3
6李仁府,胡麟,蔡伦.基于强化学习神经网的自适应高超声速飞行器控制（英文）[J].航空兵器,2018,25(6):3-10. 被引量：3
7崔颖,张克宏,肖慧,苏羽航,吴燕蓉.PLA协同纳米CaCO_3增韧PP的研究[J].中国塑料,2019,33(2):29-34. 被引量：1
8沈杰,瞿遂春,任福继,邱爱兵,徐杨.基于SGAN的中文问答生成研究[J].计算机应用与软件,2019,36(2):194-199. 被引量：5
9任桢,林都,李静.旋转倒立摆虚拟仿真模型构建与验证[J].现代电子技术,2019,42(6):60-64. 被引量：6
10李振华,李春燕,张竹.基于Q-ARMA的电子式电压互感器误差状态预测[J].中国科学：技术科学,2018,48(12):1401-1412. 被引量：8

电子设计工程

2019年第6期

浏览历史

内容加载中请稍等...

基于ELM-BP的强化学习在倒立摆控制中的研究

参考文献5

二级参考文献127

共引文献75

相关作者

相关机构

相关主题

浏览历史