基于径向基函数网络的强化学习在机器人足球中的研究(英文) 被引量：2

Study on Radial Basis Function Networks Based Reinforcement Learning in Robot Soccer

下载PDF

导出

摘要与监督学习从范例中学习的方式不同,强化学习不需要先验知识,而是具有从经验中学习的能力。将强化学习应用在大状态空间中,必须应用函数逼近的方法,如使用径向基函数网络建立输入和输出的映射关系。本文对基于径向基函数网络的强化学习在机器人足球这样的动态的多智能体环境中的应用进行了研究。实验结果证明了研究方法的可行性。 Reinforcement learning has the ability to learn from experience as opposed to supervised learning which learns from examples. Application of reinforcement learning to large state spaces necessitates the use of function approximators like Radial Basis Function Networks (RBFNs) to map between inputs and outputs. This study investigates the applicability of RBFNs based reinforcement learning methods in a dynamic multi-agent scenario of robot soccer. And the result of experiment proves that is a suitable approach. 1 Introduction1 Robot soccer is a real time, dynamic and multi-agent environment. Learning algorithms have enabled researchers to handle this kind of complex domain[1]. Reinforcement learning is increasingly attracting researchers for its ability to learn from environment by evaluating actions in the form of rewards and penalties[2]. Many reinforcement learning algorithms are based on look up table representation of the state[3]. Real life problems often involve huge state spaces, which make table based state representation impossible. Radial Basis Function Networks (RBFNs) poses as an attractive method of function approximation for the task[3][4]. This study provides an approach on RBFNs based reinforcement learning in robot soccer domain. 2 Real Robot Soccer Simulation We designed a simulation platform for real robot soccer in form of 3 vs. 3 as shown in Fig.1. The simulation game is running via the method of server/client. The platform iscomposed of three components: the server, and the two clients. The server provides virtual field, virtual vision information,kinematics model of robots and ball, collision test and treatment Fig.1 Simulator for real robot soccer kinematics model of robots and ball, collision test and treatment model[5]. Of course it also receives control command from clients and displays the game on the screen. While each client receives the information of the situation on the field sent by the server, selects one strategy, which selects the basic movement for each robot, according to the situation on the field, and it sends control command to the server, which controls the action

作者罗青李智军 Iqbal Nadeem 吕恬生

机构地区上海交通大学机器人研究所

出处《系统仿真学报》 CAS CSCD 2002年第8期1094-1097,共4页 Journal of System Simulation

关键词径向基函数强化学习机器人足球多智能体 multi-agent robot soccer reinforcement learning RBFNs

分类号 TP249 [自动化与计算机技术—检测技术与自动化装置] TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1罗青,吕恬生,费燕琼.足球机器人仿真系统的研究与开发[J].计算机仿真,1999,16(4):27-30. 被引量：9

二级参考文献4

1冯瑞，机器人足球研讨班论文集，1998年
2高大志，机器人足球研讨班论文集，1998年
3Norton P，MFC开发Windows 95/NT4应用程序，1998年
4Horizon S，Microsoft 基本类库开发指南，1995年

共引文献8

1徐晓军,李建华,王孙安.基于OPENGL足球机器人仿真平台数学模型及实现[J].计算机仿真,2004,21(10):194-196. 被引量：3
2王鹏辉,龚建伟,陆际联.多移动机器人路径规划仿真系统的设计与实现[J].计算机仿真,2006,23(9):160-164. 被引量：1
3张冰,陈万米,梁亮,魏延钦.基于OpenGL的小型组机器人足球仿真平台设计[J].系统仿真学报,2008,20(3):724-728. 被引量：7
4李建武,姜昱明.足球比赛决策仿真平台的设计与实现[J].计算机仿真,2009,26(8):225-228.
5朱明,陈万米,宋鑫坤,陈通銮.受限视野下仿真类人机器人头部控制研究[J].计算机技术与发展,2010,20(9):10-13. 被引量：2
6王艳,钱月霞.基于Matlab技术的4-RP(RR)R并联机构的运动仿真[J].硅谷,2011,4(1):173-174.
7刘永红,杨毅,李庆云.曲线孔电火花加工机器人实时动画仿真[J].计算机应用研究,2001,18(8):83-84.
8刘永红,杨毅,李庆云.曲线孔电火花加工机器人控制系统的研究[J].计算机自动测量与控制,2001,9(6):23-24.

同被引文献18

1薛方正,方帅,徐心和.多机器人对抗系统仿真中的对手建模[J].系统仿真学报,2005,17(9):2138-2141. 被引量：7
2Schneider-Fontan M,Mataric M J.Territorial multi-robot task division[J].IEEE Transactions on Robotics and Automation,1998,14(5):815-822.
3Mataric M J.Learning in behavior-based multi-robot systems:policies.models,and other agents[J].Journal of Cognitive Systems Research,2001 (2):81-93.
4Barto A G.Mahanevan S.Recent advances in hierarchical reinforcement leaming[J].Discrete Event Dynamic Sysems:Theory and Applications,2003,13(4):41-77.
5Sutton R S,Precup D,Singh S P.Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J].Artificial Intelligence,1999,112(1):181-211.
6Chaimowicz L,Campos M F M,Kumar V.Dynamic role assignment for cooperative robots[C]//Proc of the IEEE International Conference on Robotics and Automation.IEEE Press,2002:293-298.
7Perkins T J,Barto A G.Lyapunov design for safe reinforcement learning[J].Journal of Machine Learning Research,2003,3 (4/5):803-832.
8Kok J R, Vlassis N. Sparse Cooperative Q-learning [C]//Greiner R, Schuurmans D, eds.: Proc. of the 21st Int. Conf. on Machine Learning, Banff, Alberta, Canada: ACM, 2004: 481-488.
9Stone P, Sutton R. Scaling reinforcement learning toward RoboCup soccer[C]//Pro, of the 18th International Conf on Machine Learning, Morgan Kaufmann, Berkshires, Massachusetts, ACM, 2001.
10Nakashima Tomoharu, Udo Masayo. A Fuzzy Reinforcement Learning for a Ball Interception Problem[C]//D. Polani et al. (Eds.): RoboCup 2003, Andrea Bonarini, Springer LNAI 3020, 2004: 559-567.

引证文献2

1赵杰,姜健,臧希喆.基于强化学习的未知环境多机器人协作搜集[J].计算机工程与应用,2007,43(10):19-21. 被引量：2
2吴定会,李真,纪志成.基于模糊神经网络局部强化学习在Robocup中的应用[J].系统仿真学报,2007,19(16):3719-3723. 被引量：4

二级引证文献6

1姜健,闫继宏,臧希喆,赵杰.基于组合拍卖的多机器人任务死锁解决方法[J].计算机工程与应用,2008,44(4):1-3. 被引量：1
2张捍东,吴玉秀,岑豫皖.多机器人合作与协调研究进展[J].计算机工程与应用,2008,44(24):238-241. 被引量：4
3徐明亮,苏晓萍,须文波.基于禁忌搜索的option自动构造[J].系统仿真学报,2009,21(23):7479-7482.
4朱智华.发育学习在足球机器人基本动作技能中的应用[J].科学技术与工程,2010,10(8):1989-1992.
5黄颖,陈玮.RoboCup 2D仿真球员射门技能中智能算法的应用综述[J].电子世界,2012(5):80-83.
6徐勇,果鑫,刘丰年,文鸿,张文平,李长云.一种基于模糊Q学习算法的认知无线电频谱分配策略[J].湖南工业大学学报,2013,27(2):74-78. 被引量：1

1许海波,刘端阳,胡同森.基于改良蚁群算法的神经网络分类规则提取[J].计算机系统应用,2011,20(7):81-85.
2Hint.,GE,杨世乐.神经网络怎样从经验中学习[J].科学（中文版）,1993(1):77-84. 被引量：3
3宋苑.前馈神经网络BP算法在环保神经网络系统的应用[J].电脑与信息技术,2000,8(3):10-13. 被引量：2
4陈育和.像人一样思维的机器[J].国外科技动态,1996(10):10-12.
5智能计算机将在15年后超越人类:会开玩笑会调情[J].黑龙江科技信息,2014(6).
6筱玟.人工智能专家称机器人智商2029年将超人类[J].科技致富向导,2014(6):7-7. 被引量：1
7李萌.从经验中学习从借鉴中创新——访钱方支付联合创始人及产品总监李英豪[J].互联网天地,2011(12):32-35.
8哈喽，芯片先生电玩公司进军汽车业[J].汽车测试报告,2017,0(1):78-78.
9再过15年,电脑就比人聪明啦?[J].科学大众（中学生）,2014,0(5):41-41.
10冯万欣.心理健康与生命同行[J].现代教育科学（中学教师）,2014(1):23-23.

系统仿真学报

2002年第8期

浏览历史

内容加载中请稍等...