摘要
强化学习用于序列决策问题上取得的成功越来越受到人们的重视,但是当使用高维状态作为输入时,仍然存在数据效率低下的问题。造成这个问题的原因之一是智能体难以从高维空间提取有效的特征。为了提高数据效率,论文提出一种适用于强化学习任务的数据增强方法cGDA(cGANs-based Data Augment),该方法用条件生成对抗网络(cGANs)对环境的动态特性建模,以当前时刻的状态和动作作为条件生成模型的输入,输出下一时刻的状态作为增强数据。训练过程中使用真实数据和增强数据同时训练智能体,有效地帮助智能体从不同的数据中快速提取到有用的知识。在Atari100K基准上,cGDA在26个离散控制问题环境中与采用数据增强的方法比较,在16个环境中获得了更高的性能;与未采用数据增强的方法比较,在14个环境中获得了更高的性能。
More and more attention has been paid to the success of reinforcement learning in sequential decision making,but there is still a problem of low data efficiency when using high-dimensional state as input.One of the reasons for this problem is that it is difficult for an agent to extract effective features from a high-dimensional space.In order to improve data efficiency,this paper proposes a data augmentation method cGDA(cGANs-based Data Augment)suitable for reinforcement learning task.Conditional generative adversarial nets(cGANs)is used to model the dynamic characteristics of the environment,with the state and action at the current moment as the input of the conditional generation model.The model outputs the state of the next moment as augmented data.In the process of training,real data and augmented data are used to train agents,which can effectively help agents to extract useful knowledge from different data quickly.On the Atari100K benchmark,cGDA achieves higher performance in 16 of 26 discrete control problem environments compared with the methods with data augmentation.Higher performance is achieved in 14 environ-ments compared with the approach without data augmentation.
作者
项宇
秦进
袁琳琳
XIANG Yu;QIN Jin;YUAN Linlin(College of Computer Science&Technology,Guizhou University,Guiyang 550025;College of Information Engineering,Guizhou Open University,Guiyang 550023)
出处
《计算机与数字工程》
2024年第6期1739-1745,共7页
Computer & Digital Engineering
基金
贵州省科学技术基金项目(编号:黔科合基础[2020]1Y275)
贵州省科技计划项目(编号:黔科合基础[2019]1130号)资助。
关键词
强化学习
数据增强
数据效率
条件生成对抗网络
雅达利游戏
reinforcement learning
data augmentation
data efficiency
conditional generative adversarial nets
Atari games