基于多维度优先级经验回放机制的深度确定性策略梯度算法

Deep Deterministic Policy Gradient Algorithm Based on Multi-dimensional Priority Experience Replay Mechanism

下载PDF

导出

摘要为进一步提高深度确定性策略梯度算法在处理强化学习连续动作任务中的收敛速度,提出了一种基于多维度优先级经验回放机制的深度确定性策略梯度算法。首先,针对经验回放机制中样本数据利用率低的问题,利用时间差分误差指标对样本进行分类;其次,利用稀缺性和新奇性两个指标对样本进行评分,并将稀缺性和新奇性的评分进行加权组合,得到最终的优先级评分;最后,将设计的多维度优先级经验回放机制方法应用在深度确定性策略梯度算法中,在强化学习连续控制任务中对改进算法进行测试,实验结果表明,改进算法的收敛速度有所提升。 In order to further improve the convergence speed of the deep deterministic policy gradient algorithm in processing continuous action tasks of reinforcement learning,a deep deterministic policy gradient algorithm based on a multi-dimensional priority experience replay mechanism is proposed.First,to address the problem of low sample data utilization in the experience replay mechanism,the time difference error index is used to classify samples.Secondly,the two indicators of scarcity and novelty are used to score the sample,and the scarcity and novelty scores are weighted and combined to obtain the final priority score.Finally,the designed multi-dimensional priority experience replay mechanism method was applied to the deep deterministic policy gradient algorithm,and the improved algorithm was tested in the reinforcement learning continuous control task.The experimental results showed that the convergence speed of the improved algorithm was improved.

作者荣垂霆李海军朱恒伟刘延旭于士军 RONG Chuiting;LI Haijun;ZHU Hengwei;LIU Yanxu;YU Shijun(College of Computer and Information Engineering,Dezhou Unive rsity,Dezhou Shandong 253023,China)

机构地区德州学院计算机与信息学院

出处《德州学院学报》 2024年第4期21-27,32,共8页 Journal of Dezhou University

关键词深度确定性策略梯度算法强化学习经验回放机制多维度优先级 deep deterministic policy gradient algorithm reinforcement learning experience replay mechanism multi-dimensional priority

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1荣垂霆,朱恒伟,张宾,刘聪.基于深度强化学习的移动机器人路径规划研究[J].现代信息科技,2024,8(16):60-63.
2赵天亮,张小俊,张明路,陈建文.基于深度强化学习的无人驾驶路径规划研究[J].河北工业大学学报,2024,53(4):21-30.
3陈启凡,丁云飞(指导),田锟,孙钱承.基于K-means聚类的BP-DTR的电动汽车短期充电负荷预测[J].上海电机学院学报,2024,27(4):187-191.
4申志航,张玲,苏悦洪,邢洪伟,李彬.基于踝关节康复机器人的自主自适应控制策略[J].中国医疗器械杂志,2024,48(4):385-391.
5程国利.实验视频在化学课堂上的育人价值[J].视周刊,2024(10):2-2.
6王鹤玮,陈栎屹,董安琴,郭晓莉,吴娱倩,曹叶凡,贾杰,孙莉敏.基于运动想象与动作观察的动作模拟研究报道指南解读[J].中国医刊,2024,59(8):847-851.
7刘雨琳.陌生化的审美意蕴——拉斯·艾琳作品研究新视角[J].美术文献,2024(2):76-78.

德州学院学报

2024年第4期

浏览历史

内容加载中请稍等...

基于多维度优先级经验回放机制的深度确定性策略梯度算法

相关作者

相关机构

相关主题

浏览历史