Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs 被引量：3

下载PDF

导出

摘要 In this paper,we propose a clique-based sparse reinforcement learning(RL) algorithm for solving cooperative tasks.The aim is to accelerate the learning speed of the original sparse RL algorithm and to make it applicable for tasks decomposed in a more general manner.First,a transition function is estimated and used to update the Q-value function,which greatly reduces the learning time.Second,it is more reasonable to divide agents into cliques,each of which is only responsible for a specific subtask.In this way,the global Q-value function is decomposed into the sum of several simpler local Q-value functions.Such decomposition is expressed by a factor graph and exploited by the general maxplus algorithm to obtain the greedy joint action.Experimental results show that the proposed approach outperforms others with better performance.

作者 Zhen Zhang Dongbin Zhao

机构地区 the State Key Laboratory of Management and Control for Complex Systems Department of Electric Engineering

出处《IEEE/CAA Journal of Automatica Sinica》 SCIE EI 2014年第3期248-256,共9页 自动化学报（英文版）

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献14

1Dimitri P.BERTSEKAS.Approximate policy iteration:a survey and somenew methods[J].控制理论与应用（英文版）,2011,9(3):310-335. 被引量：6
2WEI QingLai,LIU DeRong.A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems[J].Science China Chemistry,2015,58(12):143-157. 被引量：8
3Lei Xue,Changyin Sun,Donald Wunsch,Yingjiang Zhou,Fang Yu.An Adaptive Strategy via Reinforcement Learning for the Prisoner＇s Dilemma Game[J].IEEE/CAA Journal of Automatica Sinica,2018,5(1):301-310. 被引量：8
4Teng Liu,Bin Tian,Yunfeng Ai,Fei-Yue Wang.Parallel Reinforcement Learning-Based Energy Efficiency Improvement for a Cyber-Physical System[J].IEEE/CAA Journal of Automatica Sinica,2020,7(2):617-626. 被引量：14
5Lan Jiang,Hongyun Huang,Zuohua Ding.Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge[J].IEEE/CAA Journal of Automatica Sinica,2020,7(4):1179-1189. 被引量：12
6YU YuePing,LIU JiChuan,WEI Chen.Hawk and pigeon's intelligence for UAV swarm dynamic combat game via competitive learning pigeon-inspired optimization[J].Science China(Technological Sciences),2022,65(5):1072-1086. 被引量：8
7Yiwei Zheng,Aiwen Lai,Xiao Yu,Weiyao Lan.Early-Awareness Collision Avoidance in Optimal Multi-Agent Path Planning With Temporal Logic Specifications[J].IEEE/CAA Journal of Automatica Sinica,2023,10(5):1346-1348. 被引量：1
8Zhe Chen,Ning Li.An Optimal Control-Based Distributed Reinforcement Learning Framework for A Class of Non-Convex Objective Functionals of the Multi-Agent Network[J].IEEE/CAA Journal of Automatica Sinica,2023,10(11):2081-2093. 被引量：2
9Rushikesh Kamalapurkar,Justin R.Klotz,Warren E.Dixon.Concurrent Learning-based Approximate Feedback-Nash Equilibrium Solution of N-player Nonzero-sum Differential Games[J].IEEE/CAA Journal of Automatica Sinica,2014,1(3):239-247. 被引量：4
10Qiming Zhao,Hao Xu,Sarangapani Jagannathan.Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning[J].IEEE/CAA Journal of Automatica Sinica,2014,1(4):372-384. 被引量：2

引证文献3

1Ali Forootani,Raffaele Iervolino,Massimo Tipaldi,Joshua Neilson.Approximate Dynamic Programming for Stochastic Resource Allocation Problems[J].IEEE/CAA Journal of Automatica Sinica,2020,7(4):975-990. 被引量：4
2Siqing Sun,Defu Cai,Hai-Tao Zhang,Ning Xing.Reinforcement Learning-Based MAS Interception in Antagonistic Environments[J].IEEE/CAA Journal of Automatica Sinica,2024,11(1):270-272.
3Qinglai Wei,Derong Liu,Yu Liu,Ruizhuo Song.Optimal Constrained Self-learning Battery Sequential Management in Microgrid Via Adaptive Dynamic Programming[J].IEEE/CAA Journal of Automatica Sinica,2017,4(2):168-176. 被引量：13

二级引证文献17

1曹博,吕明家,汪帅,赵波,李青怡,刘光伟.不规则境界露天矿剥离物动态规划研究[J].辽宁工程技术大学学报（自然科学版）,2023(4):427-437.
2Derong Liu,Yancai Xu,Qinglai Wei,Xinliang Liu.Residential Energy Scheduling for Variable Weather Solar Energy Based on Adaptive Dynamic Programming[J].IEEE/CAA Journal of Automatica Sinica,2018,5(1):36-46. 被引量：13
3王飞跃,魏庆来.智能控制:从学习控制到平行控制[J].控制理论与应用,2018,35(7):939-948. 被引量：24
4Ruizhuo Song,Liao Zhu.Optimal Fixed-Point Tracking Control for Discrete-Time Nonlinear Systems via ADP[J].IEEE/CAA Journal of Automatica Sinica,2019,6(3):657-666. 被引量：3
5Xiong Yang,Bo Zhao.Optimal Neuro-Control Strategy for Nonlinear Systems With Asymmetric Input Constraints[J].IEEE/CAA Journal of Automatica Sinica,2020,7(2):575-583. 被引量：5
6Haowei Lin,Bo Zhao,Derong Liu,Cesare Alippi.Data-based Fault Tolerant Control for Affine Nonlinear Systems Through Particle Swarm Optimized Neural Networks[J].IEEE/CAA Journal of Automatica Sinica,2020,7(4):954-964. 被引量：13
7Jingwei Lu,Qinglai Wei,Fei-Yue Wang.Parallel Control for Optimal Tracking via Adaptive Dynamic Programming[J].IEEE/CAA Journal of Automatica Sinica,2020,7(6):1662-1674. 被引量：16
8Yongliang Yang,Zhijie Liu,Qing Li,Donald C.Wunsch.Output Constrained Adaptive Controller Design for Nonlinear Saturation Systems[J].IEEE/CAA Journal of Automatica Sinica,2021,8(2):441-454. 被引量：1
9桑健,周婷,金彦亮.D2D通信中信道分配的智能优化算法研究[J].工业控制计算机,2021,34(7):117-119. 被引量：2
10Jun Mei,Zhenyu Lu,Junhao Hu,Yuling Fan.Energy-Efficient Optimal Guaranteed Cost Intermittent-Switch Control of a Direct Expansion Air Conditioning System[J].IEEE/CAA Journal of Automatica Sinica,2021,8(11):1852-1866.

1张化祥,黄上腾.Multiagent reinforcement learning through merging individually learned value functions[J].Journal of Harbin Institute of Technology(New Series),2005,12(3):346-350.
2Gyuho Eoh Kongwoo Lee Jeong H. Oh Seung-Hwan Lee Beom H. Lee.Cooperative Multiple-Object Towing with Linked Robots[J].通讯和计算机（中英文版）,2013,10(3):385-393.
3李鹏刚,胡旭东.一种PC/104总线接口的CPLD设计与实现[J].机电工程,2005,22(12):36-38. 被引量：1
4李坤,郝中军,隋晓斐.基于MAXPLUS Ⅱ的两位全加器的设计[J].科技资讯,2010,8(12):1-1.
5汪辉松,汪隽,杜群,曾贵华.Self-dual Codes Defined on Factor Graphs[J].Journal of Shanghai Jiaotong university(Science),2007,12(4):433-436.
6申彦春,王欢,梁廷贵.基于FPGA的信号发生器的设计[J].唐山学院学报,2008,21(2):21-23.
7薄涛,陈秀丽,梁廷贵.EDA技术在控制系统中的应用[J].唐山学院学报,2007,20(6):46-47.
8岳明道,郭焕银,唐永刚.仿真软件在电类课程教学中的应用[J].宿州学院学报,2007,22(6):156-158. 被引量：4
9Frank L. Lewis,Zhong-Ping Jiang,Tengfei Liu.Call for papers Special issue on Learning and control in cooperative multi-agent systems[J].Control Theory and Technology,2014,12(2):215-216.
10Frank L. Lewis,Zhong-Ping Jiang,Tengfei Liu.Special issue on learning and control in cooperative multi-agent systems[J].Control Theory and Technology,2015,13(1):44-44.

IEEE/CAA Journal of Automatica Sinica

2014年第3期

浏览历史

内容加载中请稍等...

Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs 被引量：3

同被引文献14

引证文献3

二级引证文献17

相关作者

相关机构

相关主题

浏览历史