摘要
在复杂的连续空间应用场景中,经典的离散空间强化学习方法已难以满足实际需要,而已有的连续空间强化学习方法主要采用线性拟合方法逼近状态值函数和动作选择函数,存在精度不高的问题。提出一种基于联合神经网络非线性行动者评论家方法(actor-critic approach based on union neural network,UNN-AC)。该方法将动作选择函数和评论值函数表示为统一的联合神经网络模型,利用联合神经网络非线性拟合状态值函数和动作选择概率。与已有的线性拟合方法相比,非线性UNN-AC提高了对评论值函数和动作选择函数的拟合精度。实验结果表明,UNN-AC算法能够有效求解连续空间中近似最优策略问题。与经典的连续动作空间算法相比,该算法具有收敛速度快和稳定性高的优点。
In the complex application scenarios of continuous space,it has been difficult for the classical reinforcement learning method in discrete space to meet the practical needs.The existing reinforcement learning method in continuous space mainly,however,uses linear fitting method to approximate the state value function and action selection function,and consequently has the problem of low accuracy.A nonlinear joint neural network based actor-critic approach(UNN-AC)is proposed in this paper.The action selection function and the evaluation value function are expressed as a unified joint neural network model.The joint neural network is used to fit the state value function and the action selection probability nonlinearly.Compared with the existing linear fitting methods,the non-linear UNN-AC can improve the fitting accuracy of the comment value function and the action selection function.The results show that the UNN-AC algorithm can effectively solve the approximate optimal strategy problem in continuous space.Compared with the classical continuous action space algorithm,the algorithm has the advantages of fast convergence and high stability.
作者
杨金鸿
谭斌
皇甫立
熊璋
YANG Jinhong;TAN Bin;HUANGFU Li;XIONG Zhang(Systems Engineering Research Institute of CSSC,Beijing 100094,China;College of Computer Science&Technology,Beihang University,Beijing 100192,China)