期刊文献+

基于注意力消息共享的多智能体强化学习 被引量:3

Multi-agent reinforcement learning based on attentional message sharing
下载PDF
导出
摘要 通信是非全知环境中多智能体间实现有效合作的重要途径,当智能体数量较多时,通信过程会产生冗余消息。为有效处理通信消息,提出一种基于注意力消息共享的多智能体强化学习算法AMSAC。首先,在智能体间搭建用于有效沟通的消息共享网络,智能体通过消息读取和写入完成信息共享,解决智能体在非全知、任务复杂场景下缺乏沟通的问题;其次,在消息共享网络中,通过注意力消息共享机制对通信消息进行自适应处理,有侧重地处理来自不同智能体的消息,解决较大规模多智能体系统在通信过程中无法有效识别消息并利用的问题;然后,在集中式Critic网络中,使用Native Critic依据时序差分(TD)优势策略梯度更新Actor网络参数,使智能体的动作价值得到有效评判;最后,在执行期间,智能体分布式Actor网络根据自身观测和消息共享网络的信息进行决策。在星际争霸Ⅱ多智能体挑战赛(SMAC)环境中进行实验,结果表明,与朴素Actor⁃Critic(Native AC)、博弈抽象通信(GA⁃Comm)等多智能体强化学习方法相比,AMSAC在四个不同场景下的平均胜率提升了4~32个百分点。AMSAC的注意力消息共享机制为处理多智能体系统中智能体间的通信消息提供了合理方案,在交通枢纽控制和无人机协同领域都具备广泛的应用前景。 Communication is an important way to achieve effective cooperation among multiple agents in a non⁃omniscient environment.When there are a large number of agents,redundant messages may be generated in the communication process.To handle the communication messages effectively,a multi⁃agent reinforcement learning algorithm based on attentional message sharing was proposed,called AMSAC(Attentional Message Sharing multi⁃agent Actor⁃Critic).Firstly,a message sharing network was built for effective communication among agents,and information sharing was achieved through message reading and writing by the agents,thus solving the problem of lack of communication among agents in non⁃omniscient environment with complex tasks.Then,in the message sharing network,the communication messages were processed adaptively by the attentional message sharing mechanism,and the messages from different agents were processed with importance order to solve the problem that large⁃scale multi⁃agent system cannot effectively identify and utilize the messages during the communication process.Moreover,in the centralized Critic network,the Native Critic was used to update the Actor network parameters according to Temporal Difference(TD)advantage policy gradient,so that the action values of agents were evaluated effectively.Finally,during the execution period,the decision was made by the agent distributed Actor network based on its own observations and messages from message sharing network.Experimental results in the StarCraft Multi⁃Agent Challenge(SMAC)environment show that compared with Native Actor⁃Critic(Native AC),Game Abstraction Communication(GA⁃Comm)and other multi⁃agent reinforcement learning methods,AMSAC has an average win rate improvement of 4-32 percentage points in four different scenarios.AMSAC’s attentional message sharing mechanism provides a reasonable solution for processing communication messages among agents in a multi⁃agent system,and has broad application prospects in both transportation hub control and unmanned aerial vehicle collaboration.
作者 臧嵘 王莉 史腾飞 ZANG Rong;WANG Li;SHI Tengfei(College of Data Science,Taiyuan University of Technology,Jinzhong Shanxi 030600,China;North Automatic Control Technology Institute,Taiyuan Shanxi 030006,China)
出处 《计算机应用》 CSCD 北大核心 2022年第11期3346-3353,共8页 journal of Computer Applications
基金 国家自然科学基金资助项目(61872260)。
关键词 多智能体系统 智能体协同 深度强化学习 智能体通信 注意力机制 策略梯度 multi⁃agent system agent cooperation deep reinforcement learning agent communication attention mechanism policy gradient
  • 相关文献

参考文献2

二级参考文献13

共引文献538

同被引文献16

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部