一种基于新奇的动作发育模型

An Action Developmental Model Based on Novelty

下载PDF

导出

摘要机器人的动作是一切活动的基本单元。就足球机器人而言,好的动作设计实现是决策实现的重要保证。传统的强化学习模型在整个学习过程中使用恒定学习速率,导致在未知环境下收敛速度慢,且适应性差。针对以上问题,提出了一种新的动作发育模型——基于新奇的动作发育模型;该模型在学习过程中使用基于状态的遗忘均值的学习速率,更加符合人类发育的真实过程。模型采用内在价值系统,该系统由三部分组成:奖励、惩罚和新奇评判。在机器人足球比赛中,通过机器人截球实验表明,该模型在不断变化的环境下可以高效而准确地完成相应的截球动作。 The robot＇s action is the basic element of the activities,for the robot,good action design is the important pledge to implement strategy.The learning process uses the constant learning rate in the traditional reinforce learning model,because of that robot learn in a low convergence speed and with the poor adaptation.For the above questions,a new kind of an action developmental model-action is proposed developmental model based on novelty.The model in the learning process uses the learning rate which based on the amnesic average,which is consistent with human real development process.This model uses innate value system which is consists of three parts： reward,punishment and the novelty.Robots intercepting experiments indicates that the model can be efficiently and accurately to carry out appropriate actions in constantly changing environment.

作者崔瑞丽

机构地区西北工业大学计算机学院

出处《科学技术与工程》 2011年第5期975-978,共4页 Science Technology and Engineering

关键词基于新奇的动作发育模型强化学习遗忘均值内在价值系统 action developmental model based novelty traditional reinforce learning amnesic averageinnate value system

分类号 TP242.6 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献5

1Argall B D,Chernova S,Veloso M,et al.A survey of robot learning from demonstration.Robotics and Autonomous Systems,2009;57:469-483.
2张彦铎,闵锋.基于人工神经网络的强化学习在机器人足球中的应用[J].哈尔滨工业大学学报,2004,36(7):859-861. 被引量：7
3Lungarella M,Metta G,Pfeifer R.Developmental robotics:a survey.Connection Science,December 2003;15(4):151-190.
4RanasingheN,Shen Weimin.Surprise-based learning for developmental robotics;learning and adaptive behaviors for robtic systems.LAB-RS'08.ECSIS Symposium,2008:65-70.
5Huang Xiao,Weng Juyang.Inherent value systems for autonomous mental development.InternationalJournalof Humanoid Robotics,2007;4(2):407-433.

二级参考文献7

1胡守仁余少波.神经网络导论[M].长沙:国防科技大学出版社,1992.113-129.
2[6]WATKINS C J C H, DAYAN P. Q-learning [ J ]. Machine Learning, 1994,8(3 ): 279 - 292.
3[1]TAMBE M. Tracking dynamic team activity [ J ]. Proceedings of National Conference on Artificial Intelligence (AAAl)[C]. [s. l. ]:[s.n. ],1998.
4[3]STONE P, VELOSO M. Multi-agent systems: a survey from a machine learning perspective[R]. CMU CS technical report, No. CMU - CS -97 - 193.
5[4]SINGH S. Agents and reinforcement learning [ M ]. San Mateo: CA: Miller freeman publish Inc, 1997.
6[5]SUTTON R S, BARTO A G. Reinforcement Learning [M]. [s. l.]: MITPress,1998.
7蔡庆生,张波.一种基于Agent团队的强化学习模型与应用研究[J].计算机研究与发展,2000,37(9):1087-1093. 被引量：31

共引文献6

1肖政宏,高志伟.基于ConGolog语言的动态Agent建模研究[J].石家庄铁道学院学报,2006,19(1):50-53. 被引量：1
2段勇,刘兴刚,徐心和.基于强化学习的机器人模糊控制系统设计[J].系统仿真学报,2006,18(6):1597-1600. 被引量：4
3程显毅,杨长瑀.机器人足球学习机制的研究现状与发展[J].江南大学学报（自然科学版）,2007,6(6):642-647.
4何锫,王峰,肖淑苹.基于异联想记忆Hopfield网络的强化学习[J].微计算机信息,2008,24(26):196-197. 被引量：3
5陈玉明,张广明,赵英凯.基于混合Q学习的多Agent系统[J].制造业自动化,2010,32(9):61-63.
6张新艳,郭鹏,余建波.应用深度强化学习的压边力优化控制[J].哈尔滨工业大学学报,2020,52(7):20-28. 被引量：5

1王晓东.虚拟产品开发技术[J].渤海大学学报（自然科学版）,2004,25(3):221-223. 被引量：5
2孙魁,吴成东.强化学习模型及其在避障中的应用[J].山东工业技术,2016(1):261-263.
3于化龙,朱长明,刘海波,顾国昌,沈晶.发育机器人研究综述[J].智能系统学报,2007,2(4):34-39. 被引量：6
4李瑞.强化学习主要算法的研究[J].渝西学院学报（自然科学版）,2004,3(3):22-25. 被引量：1
5陈祥章,殷智浩,蔡则苏.自主环境认知的发育机器人发育模型[J].解放军理工大学学报（自然科学版）,2013,14(5):507-510. 被引量：3
6梁宏倩.多智能体系统中强化学习模型的改进及应用[J].西安文理学院学报（自然科学版）,2008,11(2):93-96. 被引量：1
7郑磊,万百五.基于神经网络的系统优化与参数估计集成研究(ISOPE)方法[J].信息与控制,1996,25(2):82-87.
8蔡庆生,张波.一种基于Agent团队的强化学习模型与应用研究[J].计算机研究与发展,2000,37(9):1087-1093. 被引量：31
9曲丽荣.EWB虚拟技术在《模拟电子线路》课程教学中的应用[J].电脑知识与技术,2008(8):735-737. 被引量：2
10章国安,丁晨莉,包志华.认知无线Mesh网络自适应多路径算法[J].电讯技术,2010,50(9):55-59.

科学技术与工程

2011年第5期

浏览历史

内容加载中请稍等...

一种基于新奇的动作发育模型

参考文献5

二级参考文献7

共引文献6

相关作者

相关机构

相关主题

浏览历史