摘要
与监督学习从范例中学习的方式不同,强化学习不需要先验知识,而是具有从经验中学习的能力。将强化学习应用在大状态空间中,必须应用函数逼近的方法,如使用径向基函数网络建立输入和输出的映射关系。本文对基于径向基函数网络的强化学习在机器人足球这样的动态的多智能体环境中的应用进行了研究。实验结果证明了研究方法的可行性。
Reinforcement learning has the ability to learn from experience as opposed to supervised learning which learns from examples. Application of reinforcement learning to large state spaces necessitates the use of function approximators like Radial Basis Function Networks (RBFNs) to map between inputs and outputs. This study investigates the applicability of RBFNs based reinforcement learning methods in a dynamic multi-agent scenario of robot soccer. And the result of experiment proves that is a suitable approach. 1 Introduction1 Robot soccer is a real time, dynamic and multi-agent environment. Learning algorithms have enabled researchers to handle this kind of complex domain[1]. Reinforcement learning is increasingly attracting researchers for its ability to learn from environment by evaluating actions in the form of rewards and penalties[2]. Many reinforcement learning algorithms are based on look up table representation of the state[3]. Real life problems often involve huge state spaces, which make table based state representation impossible. Radial Basis Function Networks (RBFNs) poses as an attractive method of function approximation for the task[3][4]. This study provides an approach on RBFNs based reinforcement learning in robot soccer domain. 2 Real Robot Soccer Simulation We designed a simulation platform for real robot soccer in form of 3 vs. 3 as shown in Fig.1. The simulation game is running via the method of server/client. The platform iscomposed of three components: the server, and the two clients. The server provides virtual field, virtual vision information,kinematics model of robots and ball, collision test and treatment Fig.1 Simulator for real robot soccer kinematics model of robots and ball, collision test and treatment model[5]. Of course it also receives control command from clients and displays the game on the screen. While each client receives the information of the situation on the field sent by the server, selects one strategy, which selects the basic movement for each robot, according to the situation on the field, and it sends control command to the server, which controls the action
出处
《系统仿真学报》
CAS
CSCD
2002年第8期1094-1097,共4页
Journal of System Simulation