摘要
为了解决强化学习在训练样本中出现的整体工作效率滞后问题,文章研究提出了一种新方法。该方法将真实经验样本集作为模板,生成理论上可行的虚拟样本,通过智能体agent进行一次训练,智能体agent会将好的虚拟样本并入到真实样本集当中,提高训练样本的质量。该研究利用Open AI Gym作为仿真平台实现小车爬山仿真实验,验证了用生成对抗网络思想实现强化学习算法的有效性,对比Q学习算法,文章提出的“基于生成对抗网络的强化学习算法”(GRL)在追踪数据输出时,其输出的目标函数收敛次数大约少于40次,大大提高学习速度,改善了现有技术中存在网络滞后的学习情况。
In order to improve the overall work efficiency lag problem of reinforcement learning in training samples,this study proposes a new reinforcement learning algorithm based on generative adversarial networks.It uses the real experience sample set as a template to generate theoretically feasible virtual samples,and conducts a training through an agent,and the agent incorporates the good virtual samples into the real sample set to improve the quality of the training samples.This study uses Open AI Gym as a simulation platform to realize the simulation experiment of car climbing,and verifies the effectiveness of the reinforcement learning algorithm implemented with the idea of generative adversarial network.When tracking data output,the output objective function converges less than 40 times,which greatly improves the learning speed and improves the learning situation with network lag in the prior art.
作者
俞君杰
YU Junjie(Jiangsu Electric Power Information Technology Co. Ltd., Nanjing 210013, China)
出处
《微型电脑应用》
2022年第6期174-176,190,共4页
Microcomputer Applications
关键词
强化学习
生成对抗网络
训练样本
相对熵
函数收敛
reinforcement learning
generative confrontation network
training samples
relative entropy
function convergence