Skinner操作条件反射的一种仿生学习算法与机器人控制被引量：3

A Bionic Learning Algorithm Based on Skinner's Operant Conditioning and Control of Robot

下载PDF

导出

摘要针对两轮自平衡机器人的运动平衡控制问题,提出了基于Skinner操作条件反射理论的BP神经网络与资格迹相结合的仿生自主学习算法作为两轮机器人的学习机制.该算法利用资格迹能解决延迟影响、加快学习速度和提高可靠性的特点,将其与BP神经网络相结合构成复合学习算法,能够预测机器人将要获得的行为评价函数,并依据概率取向机制以一定的概率选择最大评价值对应的最优行为,从而使机器人能够在未知环境下通过与环境的交互、学习和训练,获得像人或动物一样的自主学习技能,实现对两轮机器人的运动平衡控制.最后,分别用基于Skinner操作条件反射理论的BP算法和BP资格迹复合算法对两轮机器人做了仿真实验并进行了比较.结果表明,基于Skinner操作条件反射理论的BP资格迹复合仿生自主学习算法的学习机制能够使机器人获得良好的动态性能和较快的学习速度,体现了机器人较强的自主学习技能和平衡控制能力. Aiming at the movement balance control problem of the two-wheeled self-balancing mobile robot, a bionic self-learning algorithm consisting of BP （backpropagation） neural network and eligibility traces based on Skinner＇s operant conditioning theory is put forward as a learning mechanism of the two-wheeled robot. The algorithm utilizes the characters of eligibility traces in resolving delay effect, increasing learning speed, and improving reliability and ability, so that the complex learning algorithm consisting of BP neural network and eligibility traces can predict the behavior evaluation function that the robot would obtain, and choose the optimum action corresponding to the biggest evaluation value according to the probability tendency mechanism by a certain probability. Thereby the two-wheeled robot can obtain the self-learning skills like a human or animal by interacting with, studying and training the unknown environment, and realize the movement balance control of the two-wheeled robot. Finally, two simulation experiments are done and compared using the BP algorithm and the complex learning algorithm consisting of BP neural network and eligibility traces based on Skinner＇s operant conditioning theory. The simulation results show that the learning mechanism of the complex learning algorithm consisting of BP neural network and eligibility traces based on Skinner＇s operant conditioning theory makes the robot obtain the better dynamic performance and the quicker learning speed, and reflect stronger self-learning skills and balance control abilities.

作者任红格阮晓钢

机构地区北京工业大学电子信息与控制工程学院

出处《机器人》 EI CSCD 北大核心 2010年第1期132-137,共6页 Robot

基金国家863计划资助项目(2007AA04Z226) 国家自然科学基金资助项目(60774077) 北京市教委重点资助项目(KZ200810005002) 北京市人才强教计划项目高等学校博士学科点专项科研基金资助课题

关键词 Skinner操作条件反射资格迹自主学习平衡控制两轮机器人 Skinner＇s operant conditioning eligibility trace self-learning balance control two-wheeled robot

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量：262
2张文志,吕恬生.强化学习理论在机器人应用中的几个关键问题探讨[J].计算机工程与应用,2004,40(4):69-71. 被引量：2
3Skinner B E The behavior of organisms[M]. New York, USA: Copley Publishing Group, 1938.
4Wolf R, Heisenberg M. Basic organization of operant-behavior as revealed in drosophila flight orientation[J]. Journal of Comparative Physiology A, 1991, 169(6): 699-705.
5Rosen B E, Goodwin J M, Vidal J J. Machine operantconditioning[C]//Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Piscataway, NJ, USA: IEEE, 1988: 1500-1501.
6Gaudiano P, Chang C. Adaptive obstacle avoidance with a neural network for operant conditioning: Experiments with real robots[C]//IEEE International Symposium on Computational Intelligence in Robotics and Automation. Piscataway, NJ, USA: IEEE, 1997: 13-18.
7Zalama E, Gomez J, Paul M, et al. Adaptive behavior navigation of a mobile robot[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part A - Systems and Humans, 2002, 32(1): 160-169.
8Itoh K, Miwa H, Matsumoto M, et al, Behavior model of humanoid robots based on operant conditioning[C]//IEEE/RAS International Conference on Humanoid Robots. Piscataway, N J, USA: IEEE, 2005: 220-225.
9Dominguez S, Zalama E, Garcia-Bermejo J G, et al. Robot learning in a social robot[M]//Lecture Notes in Computer Science, vol.4095. Berlin, Germany: Springer-Verlag, 2006: 691- 702.
10Singh S P, Sutton R S. Reinforcement learning with replacing eligibility traces[J]. Machine Learning, 1996, 22(1/2/3): 123- 158.

二级参考文献29

1金尚年.经典力学：第一版[M].上海:复旦大学出版社,1990..
2王毓东.电机学：第一版[M].浙江:浙江大学出版社,1990..
3[1]Sutton R S.Learning to Predict by the Methods of Temporal Difference[J].Machine Learning, 1988; (3): 9～44
4[2]Watkins J C H,Dayan Peter. Q-Learning[J].Machine Learning,1992;(8):279～292
5[3]Barto Andrew G,Sutton Richard S et al. Neural-like Adaptive Elements That can Solve Difficult Learning Control Problem[J].IEEE Trans on S M and C,1983;13(5):843～846
6[4]Yung N H C,Cang Ye. An Intelligent Mobile Vehicle Navigator Based on Fuzzy Logic and Reinforcement Learning[J].IEEE Trans on Sys Man and Cyber,Part B,1999;29(2):314～321
7[5]F Fernandz,L E Parker. Learning in Large Cooperative Multi-robot Domains[J].Intemational J of Robot and Automation,2001; (4) :217～226
8[6]Touzet C.Neural Reinforcement Learning for Behavior Synthesis[J].Robotics and Autonomous System,Special Issue on learning robots:The New Wave,1997;22:251～281
9[7]Smith,Andrew James. Applications of the Self-organising Map to Reinforcement Learning[J].Neural Networks,2002; 15:1107～1124
10[8]Samejima K,Omori T.Adaptive Internal State Space Construction Method for Reinforcement Learning of a Real-world Agent[J].Neural Networks, 1999; 12:1143～1155

共引文献312

1项宇,秦进,袁琳琳.结合向前状态预测和隐空间约束的强化学习表示算法[J].计算机系统应用,2022,31(11):148-156. 被引量：4
2安萌萌,樊秀梅,蔡含宇.基于雾计算和强化学习的交通灯智能协同控制研究[J].计算机应用研究,2020,37(2):465-469. 被引量：8
3丁志梁,潘毅群(指导),谢建彤,王尉同,黄治钟.强化学习算法在空调系统运行优化中的应用研究[J].建筑节能,2020(7):14-20. 被引量：7
4王彦朋,郭佳佳,王晓君.基于Q-Learning的青霉素发酵过程控制方法[J].信息化研究,2023,49(3):31-35.
5马庆刘,喻鹏,吴佳慧,熊翱,颜拥.基于深度强化学习的综合能源业务通道优化机制[J].北京邮电大学学报,2020,43(2):87-93.
6赵元,张合新.基于目标状态距离简化Q-learning算法的迷宫路径规划[J].火箭军工程大学学报,2019(4):79-84.
7杨志刚,田浪,单少华.基于ADAMS与MATLAB自平衡双轮车混合模型建模[J].重庆交通大学学报（自然科学版）,2013,32(1):112-117. 被引量：4
8周济,陈锋.基于强化神经网络的区域协调控制研究[J].电子技术（上海）,2010(9):20-22.
9卓睿,陈宗海,陈春林.基于强化学习和模糊逻辑的移动机器人导航[J].计算机仿真,2005,22(8):157-162. 被引量：5
10魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量：19

同被引文献25

1Riemann B L, Lephart S M. The sensorimotor system, Part I: The physiologic basis of functional joint stability[J]. Journal of Athletic Training, 2002, 37(1): 71-79.
2Houk J C, Gibson A R. Sensorimotor processing through the cerebellum[M]//New Concepts in Cerebellar Neurobiology. USA: John Wiley & Sons, 1987: 387-416.
3Rosen B E, Goodwin J M, Vidal J J. Machine operant condi- tioning[C]//Annual International Conference of the IEEE Engi- neering in Medicine and Biology Society. Piscataway, NJ, USA: IEEE, 1998: 1500-1501.
4Dominguez S, Zalama E, Garcfa-Bermejo J G, et al. Robot learning in a social robot[M]//Lecture Notes in Computer Sci- ence, vol.4095. Berlin, Germany: Springer-Verlag, 2006: 691- 702.
5Hoffmann H. Perception through visuomotor anticipation in a mobile robot[J]. Neural Networks, 2007, 20(1): 22-33.
6Touretzky D S, Saksida L M. Operant conditioning in Skinner- bots[J]. Adaptive Behavior, 1997, 5(3/4): 219-247.
7Saksida L M, Raymond S M, Touretzky D S. Shaping robot behavior using principles from instrumental conditioning[J]. Robotics and Autonomous Systems, 1997, 22(3/4): 231-249.
8林永惠.经典性条件反射同操作性条什反射的异同[J].渤海学刊,1997(1):73-76.
9Lasserson D. Nervous system and special senses[M]. Beijing: Science Press, 2002.
10Schultz W, Dayan R Montague P R. A neural substrate of pre- diction and reward[J]. Science, 1997, 275(5306): 1593-1599.

引证文献3

1任红格,史涛,张瑞成.基于操作条件反射机制的感觉运动系统认知模型的建立[J].机器人,2012,34(3):292-298. 被引量：9
2史涛,杨卫东,任红格.轮式机器人鲁棒仿生自主学习算法的研究[J].计算机测量与控制,2014,22(4):1209-1211.
3魏瑞轩,何仁珂,张启瑞,许卓凡,赵晓林.基于Skinner理论的无人机应急威胁规避方法[J].北京理工大学学报,2016,36(6):620-624. 被引量：9

二级引证文献18

1史涛,任红格.一种基于优势更新的机器人平衡控制算法[J].山东科技大学学报（自然科学版）,2013,32(3):17-21.
2魏瑞轩,许卓凡,张启瑞,何仁柯.无人机自主防碰撞控制技术新进展[J].科技导报,2017,35(7):64-68. 被引量：5
3刘鑫,杨霄鹏,刘雨帆,姚昆.基于GA-OCPA学习系统的无人机路径规划方法[J].航空学报,2017,38(11):282-292. 被引量：11
4李宗帅,陈静.一种模拟基底神经节机理的自主认知模型[J].系统仿真学报,2018,30(2):427-434.
5陈静,李冰,李莉,李宗帅.自平衡机器人内在动机小脑操作学习控制[J].控制工程,2018,25(3):448-453.
6赵传松,任红格,史涛,李福进.内在动机轮式倒立摆反应式认知系统[J].浙江大学学报（工学版）,2018,52(6):1073-1080.
7鲁鸿轩,魏瑞轩.基于三支理论的无人机对地攻击认知决策方法[J].空军工程大学学报（自然科学版）,2018,19(5):1-6. 被引量：3
8李萍,朱学强,毕楠.星形偏移平衡测试在评价女子排球运动员神经肌肉训练效果中的应用[J].天津体育学院学报,2018,33(1):86-92. 被引量：11
9魏瑞轩,倪天,赵晓林,张兴宇.认知无人机-环境系统的防碰撞稳定性[J].控制理论与应用,2019,36(9):1453-1460. 被引量：5
10许卓凡,魏瑞轩,周凯,张启瑞.无人机认知防碰撞系统安全边界研究[J].控制理论与应用,2020,37(4):776-783.

1任红格,阮晓钢.基于Boltzamnn机的机器人自主学习算法[J].北京工业大学学报,2012,38(1):60-64. 被引量：1
2任红格,阮晓钢.基于Skinner操作条件反射的两轮机器人自平衡控制[J].控制理论与应用,2010,27(10):1423-1428. 被引量：3
3蔡建羡,阮晓钢.基于遗传算法的Skinner操作条件反射学习模型[J].系统工程与电子技术,2011,33(6):1370-1376. 被引量：3
4孙羽,张汝波,徐东.强化学习中资格迹的作用[J].计算机工程,2002,28(5):128-129. 被引量：1
5刘智斌,曾晓勤,徐彦,禹继国.采用资格迹的神经网络学习控制算法[J].控制理论与应用,2015,32(7):887-894. 被引量：4
6傅启明,刘全,孙洪坤,高龙,李瑾,王辉.一种二阶TD Error快速Q(λ)算法[J].模式识别与人工智能,2013,26(3):282-292. 被引量：5
7王雪松,程玉虎,易建强,王炜强.基于Elman网络的非线性系统增强式学习控制[J].中国矿业大学学报,2006,35(5):653-657. 被引量：8
8史涛,任红格.一种基于优势更新的机器人平衡控制算法[J].山东科技大学学报（自然科学版）,2013,32(3):17-21.
9蔡建羡,阮晓钢.OCPA仿生自主学习系统及在机器人姿态平衡控制上的应用[J].模式识别与人工智能,2011,24(1):138-146. 被引量：5
10陈章宝,鲍伟超,孟帅.两轮自平衡小车的研究与设计[J].蚌埠学院学报,2016,5(5):1-5. 被引量：6

机器人

2010年第1期

浏览历史

内容加载中请稍等...

Skinner操作条件反射的一种仿生学习算法与机器人控制被引量：3

参考文献14

二级参考文献29

共引文献312

同被引文献25

引证文献3

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

Skinner操作条件反射的一种仿生学习算法与机器人控制 被引量：3

参考文献14

二级参考文献29

共引文献312

同被引文献25

引证文献3

二级引证文献18

相关作者

相关机构

相关主题

浏览历史

Skinner操作条件反射的一种仿生学习算法与机器人控制被引量：3