摘要
深度强化学习结合了深度学习在视觉上强大的感知能力来解决复杂环境的序列决策问题,但是由于采样效率低,对于复杂高维数据输入,学习其重要特征较为困难.为了从序列样本中更有效地提取信息,本文提出在深度强化学习中融合空间关系推理和记忆推理(Spatial Relationship Reasoning and Memory Reasoning,SRRMR)的模型结构.模型分为空间关系推理和记忆推理两部分,空间关系推理使用注意力机制作为空间关系学习方法隐式地推理任意两个实体间的关系,注意力机制中的查询向量融合了记忆推理的内容;记忆推理将输入图像的特征和关系作为记忆的输入,利用自注意力与记忆组成部分进行推理和交互,并将交互的结果存储在记忆单元中,使得记忆存储单元融合了空间信息与记忆信息.SRRMR模型在不同种类的Atari游戏中进行了训练和验证,结果表明,空间关系推理与记忆推理的融合在7/15个游戏环境中以更少的交互次数收敛到更好的结果,记忆推理网络在12/15个游戏中获得提升,提升智能体学习效率,更高效地利用序列中的样本,提高了强化学习的样本利用率.
Deep reinforcement learning combines the powerful visual perception of deep learning to solve the sequential decision-making problem in complex environments.However,due to the low sampling efficiency,it is difficult to learn the important features of complex high-dimensional data input.In order to extract information from sequence samples more effectively,this paper proposes a model structure integrating Spatial Relationship Reasoning and Memory Reasoning(SRRMR)in deep reinforcement learning.The model is divided into two parts:spatial relation reasoning and memory reasoning.Spatial relation reasoning uses attention mechanism as a spatial relation learning method to implicitly infer the relationship between any two entities,and the query vector in attention mechanism integrates the content of memory reasoning;Memory reasoning takes the characteristics and relations of the input image as the input of memory,uses the self attention mechanism to reason and interact with the memory components,and stores the interactive results in the memory unit,so that the memory storage unit integrates spatial information and memory information.The SRRMR model has been trained and verified in different Atari games.The results show that the integration of spatial relationship reasoning and memory reasoning converges to better results with less interaction times in 7/15 game environments,and the memory reasoning network is improved in 12/15 games,improving the learning efficiency of agents,making more efficient use of samples in sequences,and improving the sample utilization rate of reinforcement learning.
作者
刘卉玲
刘鹏
白辰甲
LIU Hui-Ling;LIU Peng;Bai Chen-Jia(School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150008;Shanghai AI Laboratory,Shanghai 200232)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2023年第4期814-826,共13页
Chinese Journal of Computers
基金
国家自然科学基金重点项目(No.51935005)
基础科研项目(No.JCKY20200603C010)
黑龙江省自然科学基金(No.LH2021F023)资助
黑龙江省科技计划项目(No.GA21C031)资助.
关键词
空间关系推理
记忆推理
深度强化学习
注意力机制
状态表示
spatial relational reasoning
memory reasoning
deep reinforcement learning
attention mechanism
state representation