期刊文献+

基于滑模思想和Elman网络的操作条件反射学习控制方法 被引量:3

Operant conditioning reflex learning control scheme based on SMC and Elman network
原文传递
导出
摘要 针对一类单输入单输出高阶非线性控制系统,提出一种基于滑模思想和Elman网络的操作条件反射(OCR)学习控制方法.该方法采用Elman网络构造滑模面-行为对的评价函数,通过滑模面的变化设计奖赏函数,根据奖赏信号更新评价函数,实现行为选择概率的更新.通过每轮次熵的定义,定量分析了所学知识的变化量.针对行走倒立摆系统的仿真实验结果表明,采用该仿生的OCR学习控制方法,可实现行走倒立摆的平衡控制. A bionic operant conditioning refiex(OCR) learning control scheme is proposed based on the thought of sliding model control(SMC) and Elman network for a class of SISO higher-order nonlinear control system. In this method, an Elman network is used as an evaluation function of sliding surface and action in the scheme. Reward signal is designed according to the change of sliding surface, and then the evaluation function is updated through the reward stimulation, while the behavior choice probability is changed. Through the definition of entropy for each round, a quantitative analysis about the knowledge change in the learning process is given. The results of the simulation experiment in the walking inverted pendulum system show that, bionic OCR learning control is used, which realizes the balancing control for the walking inverted pendulum system.
作者 阮晓钢 陈静
出处 《控制与决策》 EI CSCD 北大核心 2011年第9期1398-1401,1406,共5页 Control and Decision
基金 国家863计划项目(2007AA04Z226) 国家自然科学基金项目(60774077) 北京市自然科学基金项目(4102011) 北京市教委重点项目(KZ200810005002)
关键词 操作条件反射 滑模控制 ELMAN网络 倒立摆 平衡控制 operant conditioning reflex sliding model control Elman network entropy inverted pendulum balancing control
  • 相关文献

参考文献11

  • 1Thomdike E L. Animal intelligence: Experimental studies[M]. New York: Macmillan, 1911.
  • 2Skinner B E The behavior of organisms: An experimental analysis[M]. New York: Appleton-Century-Crofts, 1938.
  • 3Pavlov I E Conditioned reflexes[M]. Oxford: Oxford University Press, 1927.
  • 4Brembs B. Research: Neurobiology of behavior[EB/OL]. (2010-06-01). http://brembs.net.
  • 5Brembs B, Plendl W. Double dissociation of PKC and AC manipulations on operant and classical learning in drosophila[J]. Current Biology, 2008, 18(15): 1168-1117.
  • 6Brembs B. The importance of being active[J]. J of Neurogenetics, 2009: 23(1/2): 120-126.
  • 7Zalama E, Gomez J, Paul M, et al. Adaptive behavior navigation of a mobile robot[J]. IEEE Trans on Systems, Man, and Cybernetics, Part A: Systems and Humans, 2002, 32(1): 160-169.
  • 8Itoh K, Miwa H, Matsumoto M, et al, Behavior model of humanoid robots based on operant conditioning[C].IEEE/RAS Int Conf on Humanoid Robots. Piscataway: IEEE, 2005: 220-225.
  • 9王雪松,程玉虎,易建强,王炜强.基于Elman网络的非线性系统增强式学习控制[J].中国矿业大学学报,2006,35(5):653-657. 被引量:8
  • 10Guo M, Liu Y, Malec J. A new Q-learning algorithm based on the metropolis criterion[J]. IEEE Trans on System, Man, and Cybernetics, Part B: Cybernetics, 2004, 34(5): 2140- 2143.

二级参考文献13

  • 1闫友彪,陈元琰.机器学习的主要策略综述[J].计算机应用研究,2004,21(7):4-10. 被引量:57
  • 2许世范,王雪松,郝继飞.Predicting Model for Complex Production Process Based on Dynamic Neural Network[J].Journal of China University of Mining and Technology,2001,11(1):20-23. 被引量:1
  • 3MICHIE D,CHAMBERS R A.Boxes:an experiment in adaptive control[J].Machine Intelligence,1968,2(2):137-152.
  • 4BARAS J S,BORKAR V S.A learning algorithm for Markov decision processes with adaptive state aggregation[C]// Proceedings of the IEEE Conference on Decision and Control.New Jersey:Piscataway Press,2000:3351-3356.
  • 5MOORE A W,ATKESON C G.The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces[J].Machine Learning,1995,21(3):199-233.
  • 6LIN C K.A reinforcement learning adaptive fuzzy controller for robots[J].Fuzzy Sets and Systems,2003,137(3):339-352.
  • 7KUROZUMI R,FUJISAWA S,YAMAMOTO T,et al.Development of an automatic travel system for electric wheelchairs using reinforcement learning systems and CMACs[C]// Proceedings of the International Joint Conference on Neural Networks.Honolulu:Institute of Electrical and Electronics Engineers Inc.Press,2002:1690-1695.
  • 8SUTTON R S,BARTO A G.Reinforcement learning:an Introduction[M].Cambridge:The MIT Press,1998.
  • 9WATKINS C J C H,DAYAN P.Technical report:Q-learning[J].Machine Learning,1992,8(3):279-292.
  • 10SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning,1988,3(1):9-44.

共引文献7

同被引文献28

  • 1范红.基于进化神经网络的移动机器人免碰路径规划方法[J].仪器仪表学报,2006,27(z1):822-824. 被引量:5
  • 2孙方平,符秀辉.复杂环境下机器人的行为学习研究[J].仪器仪表学报,2006,27(z3):1982-1983. 被引量:2
  • 3刘金琨,孙富春.滑模变结构控制理论及其算法研究与进展[J].控制理论与应用,2007,24(3):407-418. 被引量:575
  • 4Skinner B F. The behavior of organisms[M]. New York:Appleton Century Crofts, 1938: 18-32.
  • 5Touretzky D S, Saksida L M. Operant conditioning inskinnerbots[J]. Adaptive Behavior, 1997,5(3/4): 219-247.
  • 6Saksida L M,Raymond S M, Touretzky D S.Shaping robot behavior using principles from instrumentalconditioning[J].Robotics and Autonomous Systems, 1998,22(3/4): 231-249.
  • 7Gaudiano P, Chang C. Adaptive obstacle avoidance with aneural network for operant conditioning: Experiments withreal robots[C]. IEEE Int Symposium on ComputationalIntelligence in Robotics and Automation. New York: IEEEPress, 1997: 13-18.
  • 8Itoh K, Miwa H, Matsumoto M,et al. Behavior model ofhumanoid robots based on operant conditioning [C]. The5th IEEE-RAS Int Conf on Humanoid Robots. Tsukuba:Institute of Electrical and Electronic Engineers Computer,2005: 220-225.
  • 9Pierce D,Kuipers B. Learning to explore and buildmaps[C], Proc of the National Conf on ArtificialIntelligence. Seattle: AAAI, 1994: 1264-1271.
  • 10Dean T, Angluin D, Basye K, et al. Inferring finite automatawith stochastic output functions and an application to mapleaming[J]. Machine Learning, 1995,18(1): 81-108.

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部