面向Ad-Hoc协作的局部观测重建方法

Local observation reconstruction for Ad-Hoc cooperation

下载PDF

导出

摘要在多智能体强化学习的研究中,如何进行Ad-Hoc协作,也就是说如何适应种类和数量变化的队友,是一个关键问题。现有方法或者有很强的先验知识假设,或者使用硬编码的规则进行合作,缺乏通用性,无法泛化到更一般的Ad-Hoc协作场景。为解决该问题,提出一种面向Ad-Hoc协作的局部观测重建算法,利用注意力机制和采样网络对局部观测进行重建,使得算法认识到并充分利用不同局面中的高维状态表征,实现了在Ad-Hoc协作场景下的零样本泛化。在星际争霸微操环境和Ad-Hoc协作场景上与代表性算法的性能进行对比与分析,验证了算法的有效性。 In recent years,multi-agent reinforcement learning has received a lot of attention from researchers.In the study of multi-agent reinforcement learning,the question of how to perform ad-hoc cooperation,i.e.,how to adapt to a changing variety and number of teammates,is a key problem.Existing methods either have strong prior knowledge assumptions or use hard-coded protocols for cooperation,which lack generality and can not be generalized to more general ad-hoc cooperation scenarios.To address this problem,this paper proposes a local observation reconstruction algorithm for ad-hoc cooperation,which uses attention mechanisms and sampling networks to reconstruct local observations,enabling the algorithm to recognize and make full use of high-dimensional state representations in different situations and achieve zero-shot generalization in ad-hoc cooperation scenarios.In this paper,the performance of the algorithm is compared and analyzed with representative algorithms on the StarCraft micromanagement environment and ad-hoc cooperation scenarios to verify the effectiveness of the algorithm.

作者陈皓杨立昆尹奇跃黄凯奇 CHEN Hao;YANG Likun;YIN Qiyue;HUANG Kaiqi(CRISE,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China;CAS Center for Excellence in Brain Science and Intelligence Technology,Shanghai 200031,China)

机构地区中国科学院自动化研究所智能系统与工程研究中心中国科学院大学人工智能学院中国科学院脑科学与智能技术卓越创新中心

出处《中国科学院大学学报（中英文）》 CSCD 北大核心 2024年第1期117-126,共10页 Journal of University of Chinese Academy of Sciences

基金国家自然科学基金(61876181) 北京市科技创新计划(Z19110000119043) 青年创新促进会、中国科学院和中国科学院项目(QYZDB-SSWJSC006)资助。

关键词多智能体深度强化学习信用分配 Ad-Hoc协作 multi-agent deep reinforcement learning credit assignment Ad-Hoc cooperation

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1祝冰艳,陈志华,盛斌.基于感知增强Swin Transformer的遥感图像检测[J].计算机工程,2024,50(1):216-223. 被引量：1
2刘硕,郭创新,冯斌,张勇,王艺博.基于价值分解深度强化学习的分布式光伏主动电压控制方法[J].电力自动化设备,2023,43(10):152-159.
3项凤涛,罗俊仁,谷学强,苏炯铭,张万鹏.群视角下的多智能体强化学习方法综述[J].智能科学与技术学报,2023,5(3):313-329.
4Jinfang Jiang,Chuan Lin,Guangjie Han,Adnan MAbu-Mahfouz,Syed Bilal Hussain Shah,Miguel Martínez-García.How AI-enabled SDN technologies improve the security and functionality of industrial IoT network:Architectures,enabling technologies,and opportunities[J].Digital Communications and Networks,2023,9(6):1351-1362.

中国科学院大学学报（中英文）

2024年第1期

浏览历史

内容加载中请稍等...

面向Ad-Hoc协作的局部观测重建方法

相关作者

相关机构

相关主题

浏览历史