摘要
针对情景记忆算法中记忆池中的样本利用率低的问题,提出了一种基于情景记忆和值函数分解框架相结合的合作型多智能体强化学习算法,即情景记忆值分解(episodic memory value decomposition,EMVD)算法。EMVD算法在情景记忆部分以时间差分误差平方为依据来更新记忆池,使记忆池中一直保留对学习效果提升更重要的情景记忆样本,并将情景记忆算法与神经网络相结合,提高了算法的收敛速度。为了将EMVD算法应用于机器人协作运输任务中,设定机器人和运输目标的位置为状态,并且设计了回报函数。仿真结果表明,EMVD算法可以探索出机器人协作运输任务的最优策略,提高了算法的收敛速度。
To solve the problem of low sample utilization in memory pool in episodic memory algorithm,a cooperative multi-agent reinforcement learning algorithm based on the combination of episodic memory and value function decomposition framework is proposed,that is,episodic memory value decomposition(EMVD)algorithm.In the episodic memory part,EMVD algorithm updates the memory pool based on the square of the time difference error,so that the memory pool always retains the episodic memory samples that are more important to improve the learning effect.Moreover,the episodic memory algorithm is combined with the neural network to improve the convergence speed of the algorithm.In order to apply the EMVD algorithm to the robot cooperative transportation task,the position of the robot and the transportation target is set as the state,and the return function is designed.The simulation results show that the EMVD algorithm can explore the optimal strategy of the robot cooperative transportation task and improve the convergence speed of the algorithm.
作者
周维庆
张震
宋光乐
刘明阳
宋婷婷
ZHOU Weiqing;ZHANG Zhen;SONG Guange;LIU Mingyang;SONG Tingting(School of Automation,Qingdao University,Qingdao 266071,China;Shandong Key Laboratory of Industrial Control Technology,Qingdao 266071,China;School of Intelligent Manufacturing,Weifang University of Science and Technology,Weifang 261000,China;Vehicle Maintenance Department,Third Operation Center of Qingdao Metro Operation Co.,Ltd.,Qingdao 266071,China)
出处
《控制工程》
CSCD
北大核心
2024年第7期1203-1210,共8页
Control Engineering of China
基金
国家自然科学基金资助项目(61903209)。
关键词
强化学习
多智能体强化学习
情景记忆
机器人协作运输
时间差分误差
Reinforcement learning
multi-agent reinforcement learning
episodic memory
robot cooperative transportation
time difference error