Multi-agent graphical games with input constraints:an online learning solution 被引量：2

导出

摘要 This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints.In order to obtain the optimal strategy of each agent,it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman(HJB)equations.It is very difficult to solve HJB equations by the traditional method.The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained.In this paper,an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents.Actually,this algorithm is to find the optimal solution of Bellman equations online.This solution employs a distributed policy iteration process,using only the local information available to each agent.It can be proved that under certain conditions,when each agent updates its own strategy simultaneously,the whole multi-agent system will reach Nash equilibrium.In the process of algorithm implementation,for each agent,two layers of neural networks are used to fit the value function and control strategy,respectively.Finally,a simulation example is given to show the effectiveness of our method.

作者 Tianxiang WANG Bingchang WANG Yong LIANG

机构地区 School of Control Science and Engineering

出处《Control Theory and Technology》 EI CSCD 2020年第2期148-159,共12页 控制理论与技术（英文版）

基金 supported by the National Natural Science Foundation of China(Nos.61773241,61973183) the Shandong Provincial Natural Science Foundation(No.ZR2019MF041).

关键词 Actor-critic algorithm differential games input constraints neural network(NN) reinforcement learning(RL)

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1Mohammed I.ABOUHEAF,Frank L.LEWIS,Magdi S.MAHMOUD,Dariusz G.MIKULSKI.Discrete-time dynamic graphical games:model-free reinforcement learning solution[J].Control Theory and Technology,2015,13(1):55-69. 被引量：6

二级参考文献1

1Zhang, Shijie, Duan, Guangren.Consensus seeking in multiagent cooperative control systems with bounded control input[J].控制理论与应用（英文版）,2011,9(2):210-214. 被引量：5

共引文献5

1李金娜,程薇燃.基于强化学习的数据驱动多智能体系统最优一致性综述[J].智能科学与技术学报,2020(4):327-340. 被引量：4
2朱志斌,王付永,尹艳辉,刘忠信,陈增强.基于Q-learning的离散时间多智能体系统一致性[J].控制理论与应用,2021,38(7):997-1005. 被引量：7
3Dan Jin,Bo Chen,Li Yu,Shichao Liu.Adaptive output regulation for cyber-physical systems under time-delay attacks[J].Control Theory and Technology,2022,20(1):20-31.
4程薇燃,李金娜.基于Q学习的异构多智能体系统最优一致性[J].辽宁石油化工大学学报,2022,42(4):59-67. 被引量：1
5唐静远,魏文军.离散多智能体有限时间Q学习协同输出调节[J].计算机应用研究,2023,40(1):204-208. 被引量：1

同被引文献6

1吴军,徐昕,王健,贺汉根.面向多机器人系统的增强学习研究进展综述[J].控制与决策,2011,26(11):1601-1610. 被引量：22
2陈庭强,何建敏,尹群耀.博弈学习理论下BCRT策略选择的演化动态[J].系统工程,2011,29(11):22-27. 被引量：1
3Shaoyuan LI.Towards to dynamic optimal control for large-scale distributed systems[J].Control Theory and Technology,2017,15(2):158-160. 被引量：1
4Bo PANG,Tao BIAN,Zhong-Ping JIANG.Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems[J].Control Theory and Technology,2019,17(1):73-84. 被引量：3
5陈海波.基于MAS的多机器人体系结构与协作机制的研究[J].科学与信息化,2019,0(22):10-10. 被引量：1
6赵志宏,高阳,骆斌,陈世福.多Agent系统中强化学习的研究现状和发展趋势[J].计算机科学,2004,31(3):23-27. 被引量：12

引证文献2

1Chaoxu Mu,Hao Luo,Ke Wang,Changyin Sun.Neural network-based adaptive decentralized learning control for interconnected systems with input constraints[J].Control Theory and Technology,2021,19(3):392-404.
2邹启杰,蒋亚军,高兵,李文雪,张汝波.协作多智能体深度强化学习研究综述[J].航空兵器,2022,29(6):78-88. 被引量：7

二级引证文献7

1邢博闻,张昭夷,王世明,娄嘉奕,王五桂.基于深度强化学习的多无人艇协同目标搜索算法[J].兵器装备工程学报,2023,44(11):118-125. 被引量：1
2成城,陈智杰,郭子铭,李妮.多智能体协同决策仿真平台研究与开发[J].系统仿真学报,2023,35(12):2669-2679.
3杜泳韬,赵岭忠,翟仲毅.基于注意力机制的信息预处理多智能体强化学习算法[J].国外电子测量技术,2024,43(3):91-97.
4陈翰,张远媛,何聪,朱城磊,张为.基于多智能体强化学习的目标跟踪辐射方法及设计[J].电子器件,2024,47(2):544-551.
5宋倩,蓝俊欢,罗富贵,李明珍.基于强化学习的智能车避障决策算法[J].电子设计工程,2024,32(12):181-186.
6李明阳,许可儿,宋志强,夏庆锋,周鹏.多智能体强化学习算法研究综述[J].计算机科学与探索,2024,18(8):1979-1997. 被引量：1
7魏丽珍.AI智能体在社交网络数据分析中的应用与创新[J].互联网周刊,2024(16):21-23.

1Zhinan PENG,Jiangping HU,Bijoy Kumar GHOSH.Data-driven containment control of discrete-time multi-agent systems via value iteration[J].Science China(Information Sciences),2020,63(8):250-252. 被引量：1
2WANG Jian,SHI Liangren.Semi-Global Consensus Problems of Discrete-Time Multi-Agent Systems in the Presence of Input Constraints[J].Journal of Shanghai Jiaotong university(Science),2020,25(3):288-298. 被引量：2
3Kaihong LU,Gangshan JING,Long WANG.Distributed algorithms for solving the convex feasibility problems[J].Science China(Information Sciences),2020,63(8):238-240. 被引量：1
4郭业才,周腾威.基于深度强化对抗学习的图像增强方法[J].扬州大学学报（自然科学版）,2020,23(2):42-46. 被引量：5
5于广滨,卓识,于军,刘可.基于InfoLSGAN和AC算法的滚动轴承剩余寿命预测[J].航空动力学报,2020,35(6):1212-1221. 被引量：5
6Yu-e BAO,Na LI,Linfen ZHANG.Differentiability of Interval Valued Function and Its Application in Interval Valued Programming[J].Journal of Mathematical Research with Applications,2020,40(4):415-431. 被引量：1
7GONG Zhenghua,SONG Chenwei,LI Gangqiang,CHEN Jianping,XU Zijing,YUAN Jingqi.Model Predictive Control for Steering System of Water-Jet Propulsion[J].Journal of Shanghai Jiaotong university(Science),2020,25(3):299-303.
8Zeng Qingfei,Liu Xuemei,Qiu Chengrong.Inverse kinematics and error analysis of cooperative welding robot with multiple manipulators[J].China Welding,2020,29(2):9-16. 被引量：1
9LIU Xiaoyu,XUAN Yongbo,ZHANG Zhongyu,DIAO Zhaoshi,MU Zhenxing,LI Zhitao.Event-Triggered Consensus for Discrete-Time Multi-agent Systems with Parameter Uncertainties Based on a Predictive Control Scheme[J].Journal of Systems Science & Complexity,2020,33(3):706-724. 被引量：3
10次世代的主机伴随而来的或许还有次世代的游戏定价[J].游戏机实用技术,2020(13):2-4.

Control Theory and Technology

2020年第2期

浏览历史

内容加载中请稍等...