期刊文献+

基于价值函数分解和通信学习机制的异构多智能体强化学习方法

Heterogeneous Multi-Agent Reinforcement Learning Method Based on Value Function Decomposition and Communication Learning Mechanism
下载PDF
导出
摘要 许多现实世界的系统可以被建模为多智能体系统,多智能体强化学习为开发这些系统提供了一种有效的方法,其中基于集中训练与分散执行范式的价值函数分解方法得到了广泛的研究.然而现有的价值分解方法一般缺乏通信机制,在处理需要通信学习的多智能体任务时表现不佳.同时,目前大多数通信机制都是针对同构多智能体环境设计的,没有考虑异构多智能体场景.在异构场景中,由于智能体动作空间或观测空间的异构性,智能体之间的信息共享并不直接.如果不能对智能体的异构性进行有效地建模处理,通信机制将变得无效,甚至会影响多智能体的协作性能.为了应对这些挑战,本文提出一个融合价值函数分解和通信学习机制的异构多智能体强化学习框架.具体地:(1)与采用同构图卷积网络的方法不同,该框架利用异构图卷积网络融合智能体的异构特征信息得到有效的嵌入;(2)利用通信学习模块获得的嵌入信息和局部观测历史计算每个智能体的动作价值,以选择和协调智能体的动作;(3)通过设计的互信息损失函数和价值函数分解模块的损失函数联合训练,能够有效地训练整个方法.本文首先在两个异构多智能体平台上进行实验,实验结果表明该方法能学到比基线方法更有效的策略,在两个平台上相比基线方法分别提高了 13%的平均奖励值和24%的平均胜率.此外,在交通信号控制场景中验证了该方法在现实系统中的可行性. Many real-world systems can be modeled as multi-agent systems in which multiple agents interact with the environment to learn and make decisions.Reinforcement learning has received wide attention recently and has achieved remarkable success in various fields.As practical tasks usually involve multiple agents interacting with the environment,multi-agent reinforcement learning has gradually become a research focus.Multi-agent reinforcement learning provides an effective way to develop these multi-agent systems and has achieved remarkable results in various complex sequential decision-making tasks.However,multi-agent reinforcement learning faces many challenges such as non-stationarity and dimensional curse.The value function decomposition method is one of the most popular MARL methods.By decomposing the global value function into the local individual value function,the value function decomposition method reduces the dimension of the action space to a great extent and alleviates the dimensional curse problem.In addition,agents can select actions only according to individual value functions,which solves the non-stationarity problem caused by the interaction between agents.Value function decomposition method based on centralized training and decentralized execution paradigm has been widely studied.However,the existing value decomposition methods generally lack communication mechanisms and perform poorly when dealing with multi-agent tasks requiring communication learning.At the same time,most of the current communication learning mechanisms are designed for homogeneous multi-agent environments,without considering heterogeneous multi-agent scenarios.In heterogeneous scenarios,information sharing between agents is not direct because of the heterogeneity of the agent's action space or observation space.If the heterogeneity of agents cannot be modeled effectively,the communication mechanism will become ineffective and even affect the performance of multi-agent cooperation.To address these challenges,this paper proposes a heterogeneous multi-agent rein-forcement learning framework that integrates value function decomposition and communication learning mechanisms.Specifically,(1)Different from the method using the homogeneous graph convolutional network,the framework utilizes the heterogeneous graph convolutional network to integrate the heterogeneous feature information of the agent to get effective embedding.(2)The embedding information and local observation history obtained by the communication learning module are used to calculate the action value of each agent to select and coordinate the actions of the agents.(3)Through the joint training of loss function of mutual information and value function decomposition,the proposed method can be effectively trained.The proposed method maintains the advantages of scalability and stability of value function decomposition and promotes better collaboration and decision-making of agents by utilizing diverse information interactions between heterogeneous agents.To the best of our knowledge,our work is the first attempt to combine the communication learning method based on graph convolution network and the value function learning method to develop the heterogeneous multi-agent system.The proposed frame-work provides a new idea for the field of heterogeneous multi-agent reinforcement learning.This paper first conducts experiments on two heterogeneous multi-agent platforms,and the experi-mental results show that the proposed method can learn more effective strategies than the baseline method,and the average reward value and average win rate of 13%and 24%respectively on the two platforms compared with the baseline method.In addition,the feasibility of this method in the real system is verified in the traffic signal control scenario.
作者 杜威 丁世飞 郭丽丽 张健 丁玲 DU Wei;DING Shi-Fei;GUO Li-Li;ZHANG Jian;DING Ling(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116;Mine Digitiation Engineering Research Center of Ministry of Education(China Umiversity of Mining and Technology),Xuchou,Jiangsu 221116;College of Intelligence and Computing,Tianjin University,Tianjin 300350)
出处 《计算机学报》 EI CAS CSCD 北大核心 2024年第6期1304-1322,共19页 Chinese Journal of Computers
基金 国家自然科学基金项目(62276265,61976216)资助.
关键词 价值函数分解 异构多智能体强化学习 通信机制 图神经网络 互信息 交通信号控制 value function decomposition heterogeneous multi-agent reinforcement learning communication mechanism graph neural network mutual information traffic signal control
  • 相关文献

参考文献2

二级参考文献9

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部