摘要
BP神经网络在非线性控制系统中被广泛运用 ,但作为有导师监督的学习算法 ,要求批量提供输入输出对对神经网络训练 ,而在一些并不知道最优策略的系统中 ,这样的输入输出对事先并无法得到 ;另一方面 ,强化学习从实际系统学习经验来调整策略 ,并且是一个逐渐逼近最优策略的过程 ,学习过程中并不需要导师的监督 .提出了将强化学习与 BP神经网络结合的学习算法—— RBP模型 .该模型的基本思想是通过强化学习控制策略 ,经过一定周期的学习后再用学到的知识训练神经网络 ,以使网络逐步收敛到最优状态 .
BP neural network has been used in nonlinear system controller widely. But as a supervised training algorithm, it requires the input-output pairs to be trained. But in some systems such input-output pairs cannot be received under the optimal control policy. On the other hand, reinforcement learning (RL) learns behavior through trial-and-error interaction with a dynamic environment. It is unsupervised and on-line. This paper provides the RBP model which adapts the BP network to be used in RL. The main idea of RBP is: RL learns optimal policy from the environment and stores the policy into the network. Instead of updating weights instantly, network weights are updated in batch mode periodically. A simple example is used to illustrate the validity of the algorithm.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2002年第8期981-985,共5页
Journal of Computer Research and Development
基金
国家自然科学基金资助 ( 6 990 5 0 0 1)
关键词
神经网络
强化学习算法
RBP模型
reinforcement learning, BP neural network, reinforcement back-propagation model