期刊文献+

基于内在好奇心与自模仿学习的探索算法

Exploration algorithm based on intrinsic curiosity and SIL
下载PDF
导出
摘要 针对深度强化学习算法在部分可观测环境中面临的稀疏奖励、信息缺失等问题,提出一种结合好奇心模块与自模仿学习的近端策略优化算法。该算法利用随机网络来生成探索过程中的经验样本数据,然后利用优先经验回放技术选取高质量样本,通过自模仿学习对优秀的序列轨迹进行模仿,并更新一个新的策略网络用于指导探索行为。在Minigrid环境中设置了消融与对比实验,实验结果表明,所提算法在收敛速度上具有明显优势,并且能够完成更为复杂的部分可观测环境探索任务。 In allusion to the problems of sparse rewards and missing information faced by deep reinforcement learning algorithm in partially observable environments,a proximal policy optimization algorithm combining curiosity module and self-imitation learning(SIL)is proposed.In this algorithm,the random network is used to generate empirical sample data during the exploration process,and then the priority experience replay technology is used to select high-quality samples.The excellent sequence trajectories are imitated by means of SIL,and a new policy network is updated to guide the exploration behavior.The ablation and comparison experiments were performed in the Minigrid environment.The experimental results show that the proposed algorithm has a significant advantage in convergence speed and can complete more complex exploration tasks of partially observable environments.
作者 吕相霖 臧兆祥 李思博 邹耀斌 LÜXianglin;ZANG Zhaoxiang;LI Sibo;ZOU Yaobin(Hubei Key Laboratory of Intelligent Vision Monitoring for Hydropower Engineering,China Three Gorges University,Yichang 443002,China;School of Computer and Information,China Three Gorges University,Yichang 443002,China)
出处 《现代电子技术》 北大核心 2024年第16期137-144,共8页 Modern Electronics Technique
基金 国家自然科学基金项目(61502274) 湖北省自然科学基金项目(2015CFB336)
关键词 好奇心模块 自模仿学习 深度强化学习 近端策略优化 随机网络 优先经验回放 curiosity module self-imitation learning deep reinforcement learning proximal policy optimization random network priority experience replay
  • 相关文献

参考文献3

二级参考文献16

共引文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部