The recent progress in multi-agent deep reinforcement learning(MADRL)makes it more practical in real-world tasks,but its relatively poor scalability and the partially observable constraint raise more challenges for it...The recent progress in multi-agent deep reinforcement learning(MADRL)makes it more practical in real-world tasks,but its relatively poor scalability and the partially observable constraint raise more challenges for its performance and deployment.Based on our intuitive observation that human society could be regarded as a large-scale partially observable environment,where everyone has the functions of communicating with neighbors and remembering his/her own experience,we propose a novel network structure called the hierarchical graph recurrent network(HGRN)for multi-agent cooperation under partial observability.Specifically,we construct the multiagent system as a graph,use a novel graph convolution structure to achieve communication between heterogeneous neighboring agents,and adopt a recurrent unit to enable agents to record historical information.To encourage exploration and improve robustness,we design a maximum-entropy learning method that can learn stochastic policies of a configurable target action entropy.Based on the above technologies,we propose a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant called SAC-HGRN.Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four MADRL baselines,but also demonstrate the interpretability,scalability,and transferability of the proposed model.展开更多
Erratum to:Front Inform Technol Electron Eng,2023,24(1):117-130 https://doi.org/10.1631/FITEE.2200073 Unfortunately the funding information was in-correct.It should be the National Key R&D Program of China(No.2018...Erratum to:Front Inform Technol Electron Eng,2023,24(1):117-130 https://doi.org/10.1631/FITEE.2200073 Unfortunately the funding information was in-correct.It should be the National Key R&D Program of China(No.2018AAA0102302).展开更多
基金Project supported by the National Key R&D Program of China(No.2018AAA010230)。
文摘The recent progress in multi-agent deep reinforcement learning(MADRL)makes it more practical in real-world tasks,but its relatively poor scalability and the partially observable constraint raise more challenges for its performance and deployment.Based on our intuitive observation that human society could be regarded as a large-scale partially observable environment,where everyone has the functions of communicating with neighbors and remembering his/her own experience,we propose a novel network structure called the hierarchical graph recurrent network(HGRN)for multi-agent cooperation under partial observability.Specifically,we construct the multiagent system as a graph,use a novel graph convolution structure to achieve communication between heterogeneous neighboring agents,and adopt a recurrent unit to enable agents to record historical information.To encourage exploration and improve robustness,we design a maximum-entropy learning method that can learn stochastic policies of a configurable target action entropy.Based on the above technologies,we propose a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant called SAC-HGRN.Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four MADRL baselines,but also demonstrate the interpretability,scalability,and transferability of the proposed model.
文摘Erratum to:Front Inform Technol Electron Eng,2023,24(1):117-130 https://doi.org/10.1631/FITEE.2200073 Unfortunately the funding information was in-correct.It should be the National Key R&D Program of China(No.2018AAA0102302).