期刊文献+

基于多维度优先级经验回放机制的深度确定性策略梯度算法

Deep Deterministic Policy Gradient Algorithm Based on Multi-dimensional Priority Experience Replay Mechanism
下载PDF
导出
摘要 为进一步提高深度确定性策略梯度算法在处理强化学习连续动作任务中的收敛速度,提出了一种基于多维度优先级经验回放机制的深度确定性策略梯度算法。首先,针对经验回放机制中样本数据利用率低的问题,利用时间差分误差指标对样本进行分类;其次,利用稀缺性和新奇性两个指标对样本进行评分,并将稀缺性和新奇性的评分进行加权组合,得到最终的优先级评分;最后,将设计的多维度优先级经验回放机制方法应用在深度确定性策略梯度算法中,在强化学习连续控制任务中对改进算法进行测试,实验结果表明,改进算法的收敛速度有所提升。 In order to further improve the convergence speed of the deep deterministic policy gradient algorithm in processing continuous action tasks of reinforcement learning,a deep deterministic policy gradient algorithm based on a multi-dimensional priority experience replay mechanism is proposed.First,to address the problem of low sample data utilization in the experience replay mechanism,the time difference error index is used to classify samples.Secondly,the two indicators of scarcity and novelty are used to score the sample,and the scarcity and novelty scores are weighted and combined to obtain the final priority score.Finally,the designed multi-dimensional priority experience replay mechanism method was applied to the deep deterministic policy gradient algorithm,and the improved algorithm was tested in the reinforcement learning continuous control task.The experimental results showed that the convergence speed of the improved algorithm was improved.
作者 荣垂霆 李海军 朱恒伟 刘延旭 于士军 RONG Chuiting;LI Haijun;ZHU Hengwei;LIU Yanxu;YU Shijun(College of Computer and Information Engineering,Dezhou Unive rsity,Dezhou Shandong 253023,China)
出处 《德州学院学报》 2024年第4期21-27,32,共8页 Journal of Dezhou University
关键词 深度确定性策略梯度算法 强化学习 经验回放机制 多维度优先级 deep deterministic policy gradient algorithm reinforcement learning experience replay mechanism multi-dimensional priority
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部