Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features 被引量：1

下载PDF

导出

摘要 In multi-agent reinforcement learning(MARL),the behaviors of each agent can influence the learning of others,and the agents have to search in an exponentially enlarged joint-action space.Hence,it is challenging for the multi-agent teams to explore in the environment.Agents may achieve suboptimal policies and fail to solve some complex tasks.To improve the exploring efficiency as well as the performance of MARL tasks,in this paper,we propose a new approach by transferring the knowledge across tasks.Differently from the traditional MARL algorithms,we first assume that the reward functions can be computed by linear combinations of a shared feature function and a set of taskspecific weights.Then,we define a set of basic MARL tasks in the source domain and pre-train them as the basic knowledge for further use.Finally,once the weights for target tasks are available,it will be easier to get a well-performed policy to explore in the target domain.Hence,the learning process of agents for target tasks is speeded up by taking full use of the basic knowledge that was learned previously.We evaluate the proposed algorithm on two challenging MARL tasks:cooperative boxpushing and non-monotonic predator-prey.The experiment results have demonstrated the improved performance compared with state-of-the-art MARL algorithms.

作者 Wenzhang Liu Lu Dong Dan Niu Changyin Sun

机构地区 the School of Artificial Intelligence the School of Cyber Science and Engineering the School of Automation the School of Automation

出处《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第9期1673-1686,共14页 自动化学报（英文版）

基金 the National Key R&D Program of China(2021ZD0112700,2018AAA0101400) the National Natural Science Foundation of China(62173251,61921004,U1713209) the Natural Science Foundation of Jiangsu Province of China(BK20202006)。

关键词 Knowledge transfer multi-agent systems reinforcement learning successor features

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1Yuncheng Ouyang,Lu Dong,Lei Xue,Changyin Sun.Adaptive Control Based on Neural Networks for an Uncertain 2-DOF Helicopter System With Input Deadzone and Output Constraints[J].IEEE/CAA Journal of Automatica Sinica,2019,6(3):807-815. 被引量：14

二级参考文献8

1Yuying Guo,Bin Jiang,Youmin Zhang.A Novel Robust Attitude Control for Quadrotor Aircraft Subject to Actuator Faults and Wind Gusts[J].IEEE/CAA Journal of Automatica Sinica,2018,5(1):292-300. 被引量：21
2Lei Xue,Changyin Sun,Donald Wunsch,Yingjiang Zhou,Fang Yu.An Adaptive Strategy via Reinforcement Learning for the Prisoner＇s Dilemma Game[J].IEEE/CAA Journal of Automatica Sinica,2018,5(1):301-310. 被引量：8
3Bei Sun,Chunhua Yang,Hongqiu Zhu,Yonggang Li,Weihua Gui.Modeling, Optimization, and Control of Solution Purification Process in Zinc Hydrometallurgy[J].IEEE/CAA Journal of Automatica Sinica,2018,5(2):564-576. 被引量：4
4Jing Na,Guido Herrmann.Online Adaptive Approximate Optimal Tracking Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear Systems[J].IEEE/CAA Journal of Automatica Sinica,2014,1(4):412-422. 被引量：12
5Fei-Yue Wang,Yanqing Gao.On Frequency Sensitivity and Mode Orthogonality of Flexible Robotic Manipulators[J].IEEE/CAA Journal of Automatica Sinica,2016,3(4):394-397. 被引量：5
6Naresh Malla,Ujjwol Tamrakar,Dipesh Shrestha,Zhen Ni,Reinaldo Tonkoski.Online Learning Control for Harmonics Reduction Based on Current Controlled Voltage Source Power Inverters[J].IEEE/CAA Journal of Automatica Sinica,2017,4(3):447-457. 被引量：2
7Ming Yue,Linjiu Wang,Teng Ma.Neural Network Based Terminal Sliding Mode Control for WMRs Affected by an Augmented Ground Friction With Slippage Effect[J].IEEE/CAA Journal of Automatica Sinica,2017,4(3):498-506. 被引量：8
8Wei He,Zhijun Li,C.L.Philip Chen.A Survey of Human-centered Intelligent Robots:Issues and Challenges[J].IEEE/CAA Journal of Automatica Sinica,2017,4(4):602-609. 被引量：29

共引文献13

1Tong Yang,Ning Sun,He Chen,Yongchun Fang.Swing Suppression and Accurate Positioning Control for Underactuated Offshore Crane Systems Suffering From Disturbances[J].IEEE/CAA Journal of Automatica Sinica,2020,7(3):892-900. 被引量：1
2Yuhua Song,Wei He,Xiuyu He,Zhiji Han.Vibration Control of a High-Rise Building Structure:Theory and Experiment[J].IEEE/CAA Journal of Automatica Sinica,2021,8(4):866-875. 被引量：4
3Kunhua Liu,Zihao Ye,Hongyan Guo,Dongpu Cao,Long Chen,Fei-Yue Wang.FISS GAN:A Generative Adversarial Network for Foggy Image Semantic Segmentation[J].IEEE/CAA Journal of Automatica Sinica,2021,8(8):1428-1439. 被引量：12
4Xuerao WANG,Qingling WANG,Changyin SUN.Adaptive tracking control of high-order MIMO nonlinear systems with prescribed performance[J].Frontiers of Information Technology & Electronic Engineering,2021,22(7):986-1001.
5王怡怡,赵志良.二自由度无人直升机的非线性自抗扰姿态控制[J].自动化学报,2021,47(8):1951-1962. 被引量：8
6Zahra Marvi,Bahare Kiumarsi.Barrier-Certified Learning-Enabled Safe Control Design for Systems Operating in Uncertain Environments[J].IEEE/CAA Journal of Automatica Sinica,2022,9(3):437-449.
7Xiaofei Zhang,Hongbin Ma,Wenchao Zuo,Man Luo.Adaptive Control of Discrete-time Nonlinear Systems Using ITF-ORVFL[J].IEEE/CAA Journal of Automatica Sinica,2022,9(3):556-563. 被引量：3
8Chuanlin Liao,Dan Tu,Yuming Feng,Wei Zhang,Zitao Wang,B.O.Onasanya.A Sandwich Control System with Dual Stochastic Impulses[J].IEEE/CAA Journal of Automatica Sinica,2022,9(4):741-744. 被引量：1
9Hefu Ye,Yongduan Song.Adaptive Control With Guaranteed Transient Behavior and Zero Steady-State Error for Systems With Time-Varying Parameters[J].IEEE/CAA Journal of Automatica Sinica,2022,9(6):1073-1082. 被引量：2
10蔡光斌,肖永强,胡昌华,杨小冈,凡永华.基于全驱系统方法的高阶严反馈系统时变输出约束控制[J].自动化学报,2024,50(2):372-385.

同被引文献17

1张克,刘永才,关世义.多智能体系统在导弹攻防对抗仿真中应用的可行性研究[J].战术导弹技术,2001(6):59-65. 被引量：8
2马向玲,高波,李国林.导弹集群协同作战任务规划系统[J].飞行力学,2009,27(1):1-5. 被引量：24
3王芳,涂震飚,魏佳宁.战术导弹协同突防关键技术研究[J].战术导弹技术,2013(3):13-17. 被引量：15
4曾家有,吴杰.智能反舰导弹发展需求及其关键技术[J].战术导弹技术,2018(2):36-42. 被引量：18
5李刚,王蜀杰,李兴格.地空导弹突防技术综述[J].飞航导弹,2019,0(8):35-38. 被引量：4
6李乔扬,陈桂明,许令亮.弹道导弹突防技术现状及智能化发展趋势[J].飞航导弹,2020(7):56-61. 被引量：11
7许睿,刘忠仕,张玉玲,宋天莉.平行智能导弹集群系统研究[J].西北工业大学学报,2020,38(S01):77-83. 被引量：2
8谢如恒,丁洋,杨毅.一种新的弹道导弹机动突防策略研究[J].指挥控制与仿真,2021,43(3):12-17. 被引量：6
9Chengyue Lu,Zihan Wang,Wenbo Ding,Gang Li,Sicong Liu,Ling Cheng.MARVEL:Multi-Agent Reinforcement Learning for VANET Delay Minimization[J].China Communications,2021,18(6):1-11. 被引量：2
10陈中原,韦文书,陈万春.基于强化学习的多发导弹协同攻击智能制导律[J].兵工学报,2021,42(8):1638-1647. 被引量：16

引证文献1

1聂文川,樊志强.基于EPF-MADDPG算法的多导弹机动策略研究[J].计算机测量与控制,2024,32(2):156-161.

1袁浩,刘紫燕,梁静,梁水波,孙昊堃.融合LSTM的深度强化学习视觉导航[J].无线电工程,2022,52(1):161-167. 被引量：5
2Hongbo Wang,Qian Xue,Tong Cui,Yangyang Li,Huacheng Zeng.Cold Start Problem of Vehicle Model Recognition under Cross-Scenario Based on Transfer Learning[J].Computers, Materials & Continua,2020(4):337-351. 被引量：1
3Wenli Zhang,Kaizhen Chen,Jiaqi Wang,Yun Shi,Wei Guo.Easy domain adaptation method for filling the species gap in deep learning-based fruit detection[J].Horticulture Research,2021,8(1):1730-1742. 被引量：4
4Ruyan Wang,Xue Jiang,Yujie Zhou,Zhidu Li,Dapeng Wu,Tong Tang,Alexander Fedotov,Vladimir Badenko.Multi-agent reinforcement learning for edge information sharing in vehicular networks[J].Digital Communications and Networks,2022,8(3):267-277. 被引量：3
5Ao Chen,Yongchun Xie,Yong Wang,Linfeng Li.Knowledge Graph-Based Image Recognition Transfer Learning Method for On-Orbit Service Manipulation[J].Space(Science & Technology),2021(1):164-172. 被引量：1
6韩润亭,阿拉坦格日乐.非物质文化遗产国家级传承人一说书艺人扎拉森[J].蒙古学研究（蒙文版）,2021(4):88-91.
7《系统仿真学报》编辑部.《基于多智能体强化学习的大规模无人机集群对抗》撤稿声明[J].系统仿真学报,2022,34(6).
8Ying Tian,Libing Wang,Hexin Gu,Lin Fan.Image and Feature Space Based Domain Adaptation for Vehicle Detection[J].Computers, Materials & Continua,2020(12):2397-2412. 被引量：1
9Zhiyuan Luo.Review of GAN-Based Person Re-Identification[J].Journal of New Media,2021,3(1):11-17. 被引量：1
10FU Lihua,DU Yubin,DING Yu,WANG Dan,JIANG Hanxu,ZHANG Haitao.Domain Adaptive Learning with Multi-Granularity Features for Unsupervised Person Re-identification[J].Chinese Journal of Electronics,2022,31(1):116-128. 被引量：1

IEEE/CAA Journal of Automatica Sinica

2022年第9期

浏览历史

内容加载中请稍等...