摘要
生物体记忆回放对提高其学习和决策能力有重要作用.研究表明,生物体记忆回放主要是由位于海马体内的位置细胞完成的,在回放激活顺序和具体激活位置上具有多样性,但是现有模拟海马体记忆回放研究方法大多形式单一,只模拟了单方向或者部分情形下的回放,难以较好地复现海马体记忆回放机理.因此,结合生物体记忆回放机理,多方面模拟海马体位置细胞的记忆回放功能来提高智能体的学习与决策性能,具有重要的研究价值和应用前景.针对静态栅格场景,本文通过使用组合的强化学习机制来模拟海马体重新激活的多样性,设计了一种轨迹采样和优先扫描两个过程相互交替使用的双向搜索模型,来模拟海马体不同位置记忆的再激活,同时,通过在线学习和离线学习的方式分别模拟生物体清醒和睡眠状态下的记忆机理,更好地复现海马体的记忆回放过程.进一步地,针对变化的动态场景,设计具有“一套参数,两段更新”功能的深度双向搜索模型,来提高智能体动态环境下的学习与决策性能.复杂静态和动态栅格环境下智能体导航实验以及与其他强化学习算法的性能对比实验验证了本文所提模型的有效性.
Memory replay plays an important role in improving learning and decision-making ability of organisms.Studies have shown that biology memory playback is mainly conducted by place cells in the hippocampus,on the playback activation sequence and specific activation positions diversity.Unfortunately,most of the existing researches of simulated hippocampus replay have single forms and only the replay in one direction or part of the case are simulated,which is difficult to well reproduce the hippocampus memory replay mechanism.Therefore,combining the memory playback mechanism of organisms,it is of great research value and application prospects to simulate and realize the memory playback of the hippocampal place cells,to improve the learning and decision-making performance of agents.For the static grid scenario,a combined reinforcement learning mechanism is used to simulate the diversity of the hippocampal reactivation.In this work,a bi-directional search model is designed to simulate the memory reactivation at different locations in the hippocampus by alternate use of the trajectory sampling and priority sweeping.Meanwhile,online and off-line learning is used to simulate the memory mechanism of the organism in awake and sleep statues respectively,so as to better reproduce the memory playback process of the hippocampus.Furthermore,a deep bi-directional search model with the function of“one set of parameters and two updates”is designed to enhance the learning and decision-making performance of agents in dynamic environments.Finally,agent navigation experiments in complex static and dynamic grid environments and performance comparison experiments with other reinforcement learning algorithms verify the effectiveness of the proposed model.
作者
朱觐镳
吴一帆
王东署
ZHU Jin-biao;WU Yi-fan;WANG Dong-shu(School of Electrical and Information Engineering,Zhengzhou University,Zhengzhou Henan 450001,China;Innovation Center of Intelligent Systems,Longmen Laboratory,Luoyang Henan 471000,China)
出处
《控制理论与应用》
EI
CAS
CSCD
北大核心
2024年第10期1753-1764,共12页
Control Theory & Applications
基金
国家自然科学基金项目(62173309,61873245)资助.
关键词
记忆引导
决策
海马体
记忆回放
轨迹采样
优先扫描
memory-guided
decision-making
hippocampus
memory replay
trajectory sampling
prioritized sweeping