基于滑模思想和Elman网络的操作条件反射学习控制方法被引量：3

Operant conditioning reflex learning control scheme based on SMC and Elman network

导出

摘要针对一类单输入单输出高阶非线性控制系统,提出一种基于滑模思想和Elman网络的操作条件反射(OCR)学习控制方法.该方法采用Elman网络构造滑模面-行为对的评价函数,通过滑模面的变化设计奖赏函数,根据奖赏信号更新评价函数,实现行为选择概率的更新.通过每轮次熵的定义,定量分析了所学知识的变化量.针对行走倒立摆系统的仿真实验结果表明,采用该仿生的OCR学习控制方法,可实现行走倒立摆的平衡控制. A bionic operant conditioning refiex（OCR） learning control scheme is proposed based on the thought of sliding model control（SMC） and Elman network for a class of SISO higher-order nonlinear control system. In this method, an Elman network is used as an evaluation function of sliding surface and action in the scheme. Reward signal is designed according to the change of sliding surface, and then the evaluation function is updated through the reward stimulation, while the behavior choice probability is changed. Through the definition of entropy for each round, a quantitative analysis about the knowledge change in the learning process is given. The results of the simulation experiment in the walking inverted pendulum system show that, bionic OCR learning control is used, which realizes the balancing control for the walking inverted pendulum system.

作者阮晓钢陈静

机构地区北京工业大学电子信息与控制工程学院

出处《控制与决策》 EI CSCD 北大核心 2011年第9期1398-1401,1406,共5页 Control and Decision

基金国家863计划项目(2007AA04Z226) 国家自然科学基金项目(60774077) 北京市自然科学基金项目(4102011) 北京市教委重点项目(KZ200810005002)

关键词操作条件反射滑模控制 ELMAN网络熵倒立摆平衡控制 operant conditioning reflex sliding model control Elman network entropy inverted pendulum balancing control

分类号 TP273 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献11

1Thomdike E L. Animal intelligence: Experimental studies[M]. New York: Macmillan, 1911.
2Skinner B E The behavior of organisms: An experimental analysis[M]. New York: Appleton-Century-Crofts, 1938.
3Pavlov I E Conditioned reflexes[M]. Oxford: Oxford University Press, 1927.
4Brembs B. Research: Neurobiology of behavior[EB/OL]. (2010-06-01). http://brembs.net.
5Brembs B, Plendl W. Double dissociation of PKC and AC manipulations on operant and classical learning in drosophila[J]. Current Biology, 2008, 18(15): 1168-1117.
6Brembs B. The importance of being active[J]. J of Neurogenetics, 2009: 23(1/2): 120-126.
7Zalama E, Gomez J, Paul M, et al. Adaptive behavior navigation of a mobile robot[J]. IEEE Trans on Systems, Man, and Cybernetics, Part A: Systems and Humans, 2002, 32(1): 160-169.
8Itoh K, Miwa H, Matsumoto M, et al, Behavior model of humanoid robots based on operant conditioning[C].IEEE/RAS Int Conf on Humanoid Robots. Piscataway: IEEE, 2005: 220-225.
9王雪松,程玉虎,易建强,王炜强.基于Elman网络的非线性系统增强式学习控制[J].中国矿业大学学报,2006,35(5):653-657. 被引量：8
10Guo M, Liu Y, Malec J. A new Q-learning algorithm based on the metropolis criterion[J]. IEEE Trans on System, Man, and Cybernetics, Part B: Cybernetics, 2004, 34(5): 2140- 2143.

二级参考文献13

1闫友彪,陈元琰.机器学习的主要策略综述[J].计算机应用研究,2004,21(7):4-10. 被引量：57
2许世范,王雪松,郝继飞.Predicting Model for Complex Production Process Based on Dynamic Neural Network[J].Journal of China University of Mining and Technology,2001,11(1):20-23. 被引量：1
3MICHIE D,CHAMBERS R A.Boxes:an experiment in adaptive control[J].Machine Intelligence,1968,2(2):137-152.
4BARAS J S,BORKAR V S.A learning algorithm for Markov decision processes with adaptive state aggregation[C]// Proceedings of the IEEE Conference on Decision and Control.New Jersey:Piscataway Press,2000:3351-3356.
5MOORE A W,ATKESON C G.The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces[J].Machine Learning,1995,21(3):199-233.
6LIN C K.A reinforcement learning adaptive fuzzy controller for robots[J].Fuzzy Sets and Systems,2003,137(3):339-352.
7KUROZUMI R,FUJISAWA S,YAMAMOTO T,et al.Development of an automatic travel system for electric wheelchairs using reinforcement learning systems and CMACs[C]// Proceedings of the International Joint Conference on Neural Networks.Honolulu:Institute of Electrical and Electronics Engineers Inc.Press,2002:1690-1695.
8SUTTON R S,BARTO A G.Reinforcement learning:an Introduction[M].Cambridge:The MIT Press,1998.
9WATKINS C J C H,DAYAN P.Technical report:Q-learning[J].Machine Learning,1992,8(3):279-292.
10SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning,1988,3(1):9-44.

共引文献7

1曹卫华,陈泰任,吴敏,雷琪.基于误差预测的焦炉火道温度软测量模型[J].信息与控制,2009,38(2):206-210. 被引量：6
2温淑焕,王哲,刘福才.基于Elman网络的广义预测控制快速算法[J].系统仿真学报,2009,21(22):7307-7309. 被引量：2
3程玉虎,高阳,王雪松.基于概率型支持向量分类机的Q学习[J].中国矿业大学学报,2010,39(3):408-413. 被引量：1
4贾宏玉,岳鹏飞.基于Elman神经网络的门式刚架结构损伤识别[J].钢结构,2011,26(9):23-26.
5雷萌,李明,吴楠,董亮.基于神经网络集成的挥发分近红外回归模型[J].中国矿业大学学报,2013,42(2):291-295. 被引量：10
6乔景慧,柴天佑.改进ELMAN网络的Q学习温度切换控制[J].控制理论与应用,2015,32(7):955-962. 被引量：6
7钱忆钊,陈良.Elman神经网络在电力负荷预测中的应用[J].电工技术,2019(14):55-56. 被引量：8

同被引文献28

1范红.基于进化神经网络的移动机器人免碰路径规划方法[J].仪器仪表学报,2006,27(z1):822-824. 被引量：5
2孙方平,符秀辉.复杂环境下机器人的行为学习研究[J].仪器仪表学报,2006,27(z3):1982-1983. 被引量：2
3刘金琨,孙富春.滑模变结构控制理论及其算法研究与进展[J].控制理论与应用,2007,24(3):407-418. 被引量：575
4Skinner B F. The behavior of organisms[M]. New York:Appleton Century Crofts, 1938: 18-32.
5Touretzky D S, Saksida L M. Operant conditioning inskinnerbots[J]. Adaptive Behavior, 1997,5(3/4): 219-247.
6Saksida L M,Raymond S M, Touretzky D S.Shaping robot behavior using principles from instrumentalconditioning[J].Robotics and Autonomous Systems, 1998,22(3/4): 231-249.
7Gaudiano P, Chang C. Adaptive obstacle avoidance with aneural network for operant conditioning: Experiments withreal robots[C]. IEEE Int Symposium on ComputationalIntelligence in Robotics and Automation. New York: IEEEPress, 1997: 13-18.
8Itoh K, Miwa H, Matsumoto M,et al. Behavior model ofhumanoid robots based on operant conditioning [C]. The5th IEEE-RAS Int Conf on Humanoid Robots. Tsukuba:Institute of Electrical and Electronic Engineers Computer,2005: 220-225.
9Pierce D,Kuipers B. Learning to explore and buildmaps[C], Proc of the National Conf on ArtificialIntelligence. Seattle: AAAI, 1994: 1264-1271.
10Dean T, Angluin D, Basye K, et al. Inferring finite automatawith stochastic output functions and an application to mapleaming[J]. Machine Learning, 1995,18(1): 81-108.

引证文献3

1郜园园,阮晓钢,宋洪军.操作条件反射学习自动机及其在机器人平衡控制中的应用[J].控制与决策,2013,28(6):930-934. 被引量：3
2蔡建羡,马洪蕊,程丽娜.基于仿生策略的机器人自主导航方法研究[J].计算机仿真,2014,31(1):333-338. 被引量：2
3郑晋平.双轮车自平衡运动控制系统[J].山西电子技术,2016(6):26-28.

二级引证文献5

1任红格,霍美杰,李福进,张磊.两轮自平衡机器人速度跟踪研究[J].计算机仿真,2015,32(4):325-329. 被引量：3
2陈静.柔性机器人多层启发式动态规划平衡认知研究[J].系统仿真学报,2018,30(1):147-155. 被引量：1
3蔡春山,王佐勋.基于LQR的两轮机器人的平衡控制[J].齐鲁工业大学学报,2018,32(1):55-60. 被引量：3
4韩竺秦,张丽娜.两轮平衡机器人控制系统设计与仿真研究[J].软件导刊,2019,18(3):86-90. 被引量：4
5阮晓钢,张家辉,黄静,柴洁,武悦.一种结合内在动机理论的移动机器人环境认知模型[J].控制与决策,2021,36(9):2211-2217. 被引量：3

1王帅,李光泽,李宾泽.基于操作条件反射的自主学习型智能系统[J].科技创新导报,2014,11(10):223-223.
2郜园园,阮晓钢,宋洪军.操作条件反射学习自动机及其在机器人平衡控制中的应用[J].控制与决策,2013,28(6):930-934. 被引量：3
3史涛,杨卫东,任红格.轮式机器人鲁棒仿生自主学习算法的研究[J].计算机测量与控制,2014,22(4):1209-1211.
4郜园园,阮晓钢,宋洪军,于建均.一种基于混合学习策略的移动机器人路径规划方法[J].控制与决策,2012,27(12):1822-1827. 被引量：4
5阮晓钢,黄静,范青武,魏若岩.一种基于操作条件反射原理的学习模型[J].控制与决策,2014,29(6):1016-1020. 被引量：4
6阮晓钢,张晓平,武璇,庞涛.基于学习自动机的具有内发动机的感知运动系统的建立[J].控制与决策,2016,31(2):303-309. 被引量：3
7阮晓钢,戴丽珍,于乃功,于建均.一种自治操作条件反射自动机[J].控制理论与应用,2012,29(11):1452-1457. 被引量：2
8戴丽珍,杨刚,阮晓钢.基于AOCA仿生学习模型的两轮机器人自主平衡学习研究[J].自动化学报,2014,40(9):1951-1957. 被引量：3
9郜园园,朱凡,宋洪军.进化操作行为学习模型及在移动机器人避障上的应用[J].计算机应用,2013,33(8):2283-2288. 被引量：4
10蔡建羡,阮晓钢.基于遗传算法的Skinner操作条件反射学习模型[J].系统工程与电子技术,2011,33(6):1370-1376. 被引量：3

控制与决策

2011年第9期

浏览历史

内容加载中请稍等...

基于滑模思想和Elman网络的操作条件反射学习控制方法被引量：3

参考文献11

二级参考文献13

共引文献7

同被引文献28

引证文献3

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于滑模思想和Elman网络的操作条件反射学习控制方法 被引量：3

参考文献11

二级参考文献13

共引文献7

同被引文献28

引证文献3

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于滑模思想和Elman网络的操作条件反射学习控制方法被引量：3