基于情景记忆式强化学习的协作运输方法

Cooperative Transportation Method Based on Episodic Memory Reinforcement Learning

下载PDF

导出

摘要针对情景记忆算法中记忆池中的样本利用率低的问题,提出了一种基于情景记忆和值函数分解框架相结合的合作型多智能体强化学习算法,即情景记忆值分解(episodic memory value decomposition,EMVD)算法。EMVD算法在情景记忆部分以时间差分误差平方为依据来更新记忆池,使记忆池中一直保留对学习效果提升更重要的情景记忆样本,并将情景记忆算法与神经网络相结合,提高了算法的收敛速度。为了将EMVD算法应用于机器人协作运输任务中,设定机器人和运输目标的位置为状态,并且设计了回报函数。仿真结果表明,EMVD算法可以探索出机器人协作运输任务的最优策略,提高了算法的收敛速度。 To solve the problem of low sample utilization in memory pool in episodic memory algorithm,a cooperative multi-agent reinforcement learning algorithm based on the combination of episodic memory and value function decomposition framework is proposed,that is,episodic memory value decomposition(EMVD)algorithm.In the episodic memory part,EMVD algorithm updates the memory pool based on the square of the time difference error,so that the memory pool always retains the episodic memory samples that are more important to improve the learning effect.Moreover,the episodic memory algorithm is combined with the neural network to improve the convergence speed of the algorithm.In order to apply the EMVD algorithm to the robot cooperative transportation task,the position of the robot and the transportation target is set as the state,and the return function is designed.The simulation results show that the EMVD algorithm can explore the optimal strategy of the robot cooperative transportation task and improve the convergence speed of the algorithm.

作者周维庆张震宋光乐刘明阳宋婷婷 ZHOU Weiqing;ZHANG Zhen;SONG Guange;LIU Mingyang;SONG Tingting(School of Automation,Qingdao University,Qingdao 266071,China;Shandong Key Laboratory of Industrial Control Technology,Qingdao 266071,China;School of Intelligent Manufacturing,Weifang University of Science and Technology,Weifang 261000,China;Vehicle Maintenance Department,Third Operation Center of Qingdao Metro Operation Co.,Ltd.,Qingdao 266071,China)

机构地区青岛大学自动化学院山东省工业控制技术重点实验室潍坊科技学院智能制造学院青岛地铁运营有限公司运营三中心车辆维保部

出处《控制工程》 CSCD 北大核心 2024年第7期1203-1210,共8页 Control Engineering of China

基金国家自然科学基金资助项目(61903209)。

关键词强化学习多智能体强化学习情景记忆机器人协作运输时间差分误差 Reinforcement learning multi-agent reinforcement learning episodic memory robot cooperative transportation time difference error

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]