Knowledge transfer in multi-agent reinforcement learning with incremental number of agents 被引量：1

下载PDF

导出

摘要 In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results.

作者 LIU Wenzhang DONG Lu LIU Jian SUN Changyin

机构地区 School of Automation School of Cyber Science and Engineering

出处《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期447-460,共14页 系统工程与电子技术（英文版）

基金 supported by the National Key R&D Program of China (2018AAA0101400) the National Natural Science Foundation of China (62173251 61921004 U1713209) the Natural Science Foundation of Jiangsu Province of China (BK20202006) the Guangdong Provincial Key Laboratory of Intelligent Decision and Cooperative Control。

关键词 knowledge transfer multi-agent reinforcement learning(MARL) new agents

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献2

1LI Yue,QIU Xiaohui,LIU Xiaodong,XIA Qunli.Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs[J].Journal of Systems Engineering and Electronics,2020,31(4):734-742. 被引量：10
2Xiang Gao,Yangwang Fang,Youli Wu.Fuzzy Q learning algorithm for dual-aircraft path planning to cooperatively detect targets by passive radars[J].Journal of Systems Engineering and Electronics,2013,24(5):800-810. 被引量：6

二级参考文献3

1Chen, Wei, Fu, Yinfei.Cooperative distributed target tracking algorithm in mobile wireless sensor networks[J].控制理论与应用（英文版）,2011,9(2):155-164. 被引量：8
2修建娟,王国宏,何友,修建华.纯方位系统中的定位模糊区分析[J].系统工程与电子技术,2005,27(8):1391-1393. 被引量：15
3Hui Yaoluo,Nan Ying,Chen Shaodong,Ding Quanxin,Wu Shengliang.Dynamic attack zone of air-to-air missile after being launched in random wind field[J].Chinese Journal of Aeronautics,2015,28(5):1519-1528. 被引量：18

共引文献13

1于卓静,孙永荣,朱云峰,范胜林.测角测距信息下的双机协同高精度定位算法[J].兵工自动化,2019,38(2):1-5. 被引量：6
2况立群,李思远,冯利,韩燮,徐清宇.深度强化学习算法在智能军事决策中的应用[J].计算机工程与应用,2021,57(20):271-278. 被引量：5
3Luhe Wang,Jinwen Hu,Zhao Xu,Chunhui Zhao.Autonomous maneuver strategy of swarm air combat based on DDPG[J].Autonomous Intelligent Systems,2021,1(1):232-243. 被引量：4
4SONG Wanping,CHEN Zengqiang,SUN Mingwei,SUN Qinglin.Reinforcement learning based parameter optimization of active disturbance rejection control for autonomous underwater vehicle[J].Journal of Systems Engineering and Electronics,2022,33(1):170-179. 被引量：1
5王晓丹,向前,李睿,来杰.深度学习研究及军事应用综述[J].空军工程大学学报（自然科学版）,2022,23(1):1-11. 被引量：6
6韩明仁,王玉峰.基于强化学习的全电推进卫星变轨优化方法[J].系统工程与电子技术,2022,44(5):1652-1661. 被引量：1
7万齐天,卢宝刚,赵雅心,温求遒.基于深度强化学习的驾驶仪参数快速整定方法[J].系统工程与电子技术,2022,44(10):3190-3199.
8ZHANG Honghong,GAN Xusheng,LI Shuangfeng,CHEN Zhiyuan.UAV safe route planning based on PSO-BAS algorithm[J].Journal of Systems Engineering and Electronics,2022,33(5):1151-1160. 被引量：3
9LI Bohao,WU Yunjie,LI Guofei.Hierarchical reinforcement learning guidance with threat avoidance[J].Journal of Systems Engineering and Electronics,2022,33(5):1173-1185.
10任智,张栋,唐硕.基于强化学习的改进三维A^(*)算法在线航迹规划[J].系统工程与电子技术,2023,45(1):193-201. 被引量：2

同被引文献6

1贾永楠,田似营,李擎.无人机集群研究进展综述[J].航空学报,2020(S01):4-14. 被引量：77
2邹长杰,郑皎凌,张中雷.基于GAED-MADDPG多智能体强化学习的协作策略研究[J].计算机应用研究,2020,37(12):3656-3661. 被引量：6
3陈灿,莫雳,郑多,程子恒,林德福.非对称机动能力多无人机智能协同攻防对抗[J].航空学报,2020,41(12):337-349. 被引量：16
4高昂,董志明,李亮,宋敬华,段莉.MADDPG算法并行优先经验回放机制[J].系统工程与电子技术,2021,43(2):420-433. 被引量：9
5符小卫,王辉,徐哲.基于DE-MADDPG的多无人机协同追捕策略[J].航空学报,2022,43(5):522-535. 被引量：14
6李静晨,史豪斌,黄国胜.基于自注意力机制和策略映射重组的多智能体强化学习算法[J].计算机学报,2022,45(9):1842-1858. 被引量：1

引证文献1

1张钰欣,赵恩娇,赵玉新.规则耦合下的多异构子网络MADDPG博弈对抗算法[J].智能系统学报,2024,19(1):190-208.

1Yanli Hou,Bin Wu.Atezolizumab plus bevacizumab versus sorafenib as first-line treatment for unresectable hepatocellular carcinoma:a cost-effectiveness analysis[J].Cancer Communications,2020,40(12):743-745. 被引量：7
2朱贵鑫,陈传光.We Act,We Improve[J].中学生英语,2019(47):7-7.
3Yuan-Hong Xie,Ying-Xuan Chen,Jing-Yuan Fang.Comprehensive review of targeted therapy for colorectal cancer[J].Signal Transduction and Targeted Therapy,2020,5(1):2155-2184. 被引量：15
4郑士璟,洪逸威.澳大利亚国家高中历史课程标准中历史技能培养路径探析[J].历史教学（上半月）,2020(4):68-72. 被引量：1
5Yifei Wei,Yinxiang Qu,Min Zhao,Lianping Zhang,F.Richard Yu.Resource Allocation and Power Control Policy for Device-to-Device Communication Using Multi-Agent Reinforcement Learning[J].Computers, Materials & Continua,2020(6):1515-1532.
6郭天昊,张钢,岳文渊,王倩,郭大波.基于多智能体强化学习的无人机群室内辅助救援[J].计算机系统应用,2022,31(2):88-95. 被引量：2
7Yushuang Lyu,Muqi Yin,Fangjie Xi,Xiaojun Hu.Progress and Knowledge Transfer from Science to Technology in the Research Frontier of CRISPR Based on the LDA Model[J].Journal of Data and Information Science,2022,7(1):1-19. 被引量：2
8Wei Zhou,Dong Chen,Jun Yan,Zhaojian Li,Huilin Yin,Wanchen Ge.Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic[J].Autonomous Intelligent Systems,2022,2(1):60-70. 被引量：2
9Qi WANG,Zhen FAN,Weihua SHENG,Senlin ZHANG,Meiqin LIU.Cloud-assisted cognition adaptation for service robots in changing home environments[J].Frontiers of Information Technology & Electronic Engineering,2022,23(2):246-257.
10ZHANG Jiandong,YANG Qiming,SHI Guoqing,LU Yi,WU Yong.UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning[J].Journal of Systems Engineering and Electronics,2021,32(6):1421-1438. 被引量：10

Journal of Systems Engineering and Electronics

2022年第2期

浏览历史

内容加载中请稍等...