一种基于双经验池优先采样的深度强化学习算法

A Deep Reinforcement Learning Algorithm Based on Double Experience Memory and Prioritized Experience Replay

下载PDF

导出

摘要智能体在游戏、机器人控制、自动驾驶和自然语言处理等领域有着广泛应用。然而,稀疏奖励问题成为智能体学习和探索的困难之一。文章提出了改进算法,采用双经验池存储经验样本,并融入优先经验采样以提高采样效率。同时,对奖励函数进行重构,细分为多段奖励,以引导智能体学习。实验结果表明,改进算法优于传统DQN(Deep Q-Network)算法和同策略的A2C(Advantage Actor-Critic)算法,有效应对了稀疏奖励问题,并提高了智能体的学习效率。在经典Cartpole游戏环境中进行的实验验证了改进算法的优越性。 Intelligent agents have been widely applied in various fields,such as gaming,robot control,autonomous driving,and natural language processing.However,the problem of sparse rewards has become a challenging issue for learning and explora-tion in these domains.This paper proposes an improved algorithm that utilizes dual experience replay buffers and incorporates prioritized experience sampling to enhance the efficiency of data sampling.Furthermore,the reward function is restructured into multiple segments,providing guiding rewards to facilitate the learning process of the intelligent agent.Experimental results demonstrate that the proposed algorithm outperforms traditional Deep Q-Network(DQN)and Advantage Actor-Critic(A2C)al-gorithms,effectively addressing the challenges posed by sparse reward settings and significantly improving the learning effi-ciency of the intelligent agent.The effectiveness of the improved algorithm is validated through experiments conducted in the classic Cartpole gaming environment.

作者李思博臧兆祥吕相霖 LI Sibo;ZANG Zhaoxiang;LYU Xianglin(Hubei Key Laboratory of Itelligegnt Visual Monitoring for Hydropower Engineering,Three Gorges University,Yichang 443002,China;School of Computer and Information,Three Gorges University,Yichang 443002,China)

机构地区三峡大学水电工程智能视觉监测湖北省重点实验室三峡大学计算机与信息学院

出处《长江信息通信》 2023年第11期73-76,共4页 Changjiang Information & Communications

基金国家自然科学基金(No.61502274) 湖北省自然科学基金(No.2015CFB336)资助项目。

关键词稀疏奖励双经验池优先经验回放奖励函数深度强化学习 sparse reward double experience replay memory prioritized experience replay reward function Deep reinforcem-ent learning

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1吴培良,张彦,毛秉毅,陈雯柏,高国伟.面向稀疏奖励的机器人操作技能学习[J].控制理论与应用,2024,41(1):99-108.
2秦天为,冯云剑.基于Actor-Critic自适应PID的钢筋套丝头跟踪检测控制系统研究[J].工业控制计算机,2024,37(2):75-77.
3张柄汉,王琛,彭兆涛,张夷斋,张帆.一种面向空间非合作目标的强化学习多臂协同俘获策略研究[J].宇航学报,2023,44(12):1934-1943.
4廖程建,刘思懿,赵晨羽,张果,侯宏伟,朱瀚然,夏晓晨,许魁.基于多智能体强化学习的空地网络抗干扰传输方法研究[J].移动通信,2024,48(1):71-78.
5刘道华,魏丁二,宣贺君,余长鸣,寇丽博.一种改进的双深度Q网络服务功能链部署算法[J].西安电子科技大学学报,2024,51(1):52-59.
6宋紫阳,李军怀,王怀军,苏鑫,于蕾.基于路径模仿和SAC强化学习的机械臂路径规划算法[J].计算机应用,2024,44(2):439-444. 被引量：1
7王榆,陈凯,周云婷.配电架空线路自动化清洗机器人路径规划仿真[J].计算机仿真,2023,40(12):128-132.
8刘丰瑞,颜格,张晓龙,张文明,王国鹏.基于深度强化学习的动基座双自由度系统动力学控制方法[J].动力学与控制学报,2023,21(10):26-33.
9栗军伟,刘全,徐亚鹏.基于互信息优化的Option-Critic算法[J].计算机科学,2024,51(2):252-258.
10邓辅秦,官桧锋,谭朝恩,付兰慧,王宏民,林天麟,张建民.基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法[J].计算机应用,2024,44(2):432-438.

长江信息通信

2023年第11期

浏览历史

内容加载中请稍等...

一种基于双经验池优先采样的深度强化学习算法

相关作者

相关机构

相关主题

浏览历史