为了解决基于平均场的多智能体强化学习(M3-UCRL)算法中的环境动力学模型对下一时刻状态预测不精确和策略学习样本过少的问题。本文利用了去噪概率扩散模型(Denoising Diffusion Probabilistic Models, DDPM)的数据生成能力,提出了一种...为了解决基于平均场的多智能体强化学习(M3-UCRL)算法中的环境动力学模型对下一时刻状态预测不精确和策略学习样本过少的问题。本文利用了去噪概率扩散模型(Denoising Diffusion Probabilistic Models, DDPM)的数据生成能力,提出了一种基于DDPM的平均场多智能体强化学习(DDPM-M3RL)算法。该算法将环境模型的生成表述为去噪问题,利用DDPM算法,提高了环境模型对下一时刻状态预测的精确度,也为后续的策略学习提供了充足的样本数据,提高了策略模型的收敛速度。实验结果表明,该算法可以有效提高环境动力学模型对下一时刻状态预测的精确度,根据环境动力学模型生成的状态转移数据可以为策略学习提供充足的学习样本,有效提高了导航策略的性能和稳定性。To solve the problems of inaccurate prediction of the next state by the environment dynamics model and too few samples for policy learning in the mean field based multi-agent reinforcement learning (M3-UCRL) algorithm, this paper takes advantage of the data generation capability of denoising diffusion probability models (DDPM) and proposes a mean field multi-agent reinforcement learning (DDPM-M3RL) algorithm based on DDPM. The algorithm formulates the generation of the environment model as a denoising problem. By using the DDPM algorithm, the accuracy of the environment model’s prediction of the next state is improved, and sufficient sample data is provided for subsequent policy learning, which improves the convergence speed of the policy model. Experimental results show that the algorithm can effectively improve the accuracy of the environment dynamics model’s prediction of the next state, and the state transition data generated by the environment dynamics model can provide sufficient learning samples for policy learning, which effectively improves the performance and stability of the navigation strategy.展开更多
The finite horizon H_2/H_∞ control problem of mean-field type for discrete-time systems is considered in this paper. Firstly, the authors derive a mean-field stochastic bounded real lemma(SBRL). Secondly, a sufficien...The finite horizon H_2/H_∞ control problem of mean-field type for discrete-time systems is considered in this paper. Firstly, the authors derive a mean-field stochastic bounded real lemma(SBRL). Secondly, a sufficient condition for the solvability of discrete-time mean-field stochastic linearquadratic(LQ) optimal control is presented. Thirdly, based on SBRL and LQ results, this paper establishes a sufficient condition for the existence of discrete-time stochastic H_2/H_∞ control of meanfield type via the solvability of coupled matrix-valued equations.展开更多
This paper studies the decentralized optimal control of discrete-time system with input delay,where a large number of agents with the identical decoupling dynamical equations and the coupling cost function through the...This paper studies the decentralized optimal control of discrete-time system with input delay,where a large number of agents with the identical decoupling dynamical equations and the coupling cost function through the mean field are considered.The decentralized and centralized optimal controllers are proposed by the optimal tracking control of LQR problem with delay.They are proved that the optimal controllers and the optimal cost function of the centralized and decentralized solutions are equivalent for the optimal control problem.An illustrative example is given to show the efficiency of the decentralized optimal controllers.展开更多
Mean field theory has raised a lot of interest in the recent years (see in particular the results of Lasry-Lions in 2006 and 2007,of Gueant-Lasry-Lions in 2011,of HuangCaines-Malham in 2007 and many others).There are ...Mean field theory has raised a lot of interest in the recent years (see in particular the results of Lasry-Lions in 2006 and 2007,of Gueant-Lasry-Lions in 2011,of HuangCaines-Malham in 2007 and many others).There are a lot of applications.In general,the applications concern approximating an infinite number of players with common behavior by a representative agent.This agent has to solve a control problem perturbed by a field equation,representing in some way the behavior of the average infinite number of agents.This approach does not lead easily to the problems of Nash equilibrium for a finite number of players,perturbed by field equations,unless one considers averaging within different groups,which has not been done in the literature,and seems quite challenging.In this paper,the authors approach similar problems with a different motivation which makes sense for control and also for differential games.Thus the systems of nonlinear partial differential equations with mean field terms,which have not been addressed in the literature so far,are considered here.展开更多
This paper considers the problem of partially observed optimal control for forward-backward stochastic systems driven by Brownian motions and an independent Poisson random measure with a feature that the cost function...This paper considers the problem of partially observed optimal control for forward-backward stochastic systems driven by Brownian motions and an independent Poisson random measure with a feature that the cost functional is of mean-field type. When the coefficients of the system and the objective performance functionals are allowed to be random, possibly non-Markovian, Malliavin calculus is employed to derive a maximum principle for the optimal control of such a system where the adjoint process is explicitly expressed. The authors also investigate the mean-field type optimal control problem for the system driven by mean-field type forward-backward stochastic differential equations(FBSDEs in short) with jumps, where the coefficients contain not only the state process but also its expectation under partially observed information. The maximum principle is established using convex variational technique. An example is given to illustrate the obtained results.展开更多
文摘为了解决基于平均场的多智能体强化学习(M3-UCRL)算法中的环境动力学模型对下一时刻状态预测不精确和策略学习样本过少的问题。本文利用了去噪概率扩散模型(Denoising Diffusion Probabilistic Models, DDPM)的数据生成能力,提出了一种基于DDPM的平均场多智能体强化学习(DDPM-M3RL)算法。该算法将环境模型的生成表述为去噪问题,利用DDPM算法,提高了环境模型对下一时刻状态预测的精确度,也为后续的策略学习提供了充足的样本数据,提高了策略模型的收敛速度。实验结果表明,该算法可以有效提高环境动力学模型对下一时刻状态预测的精确度,根据环境动力学模型生成的状态转移数据可以为策略学习提供充足的学习样本,有效提高了导航策略的性能和稳定性。To solve the problems of inaccurate prediction of the next state by the environment dynamics model and too few samples for policy learning in the mean field based multi-agent reinforcement learning (M3-UCRL) algorithm, this paper takes advantage of the data generation capability of denoising diffusion probability models (DDPM) and proposes a mean field multi-agent reinforcement learning (DDPM-M3RL) algorithm based on DDPM. The algorithm formulates the generation of the environment model as a denoising problem. By using the DDPM algorithm, the accuracy of the environment model’s prediction of the next state is improved, and sufficient sample data is provided for subsequent policy learning, which improves the convergence speed of the policy model. Experimental results show that the algorithm can effectively improve the accuracy of the environment dynamics model’s prediction of the next state, and the state transition data generated by the environment dynamics model can provide sufficient learning samples for policy learning, which effectively improves the performance and stability of the navigation strategy.
基金supported by the National Natural Science Foundation of China under Grant Nos.61573227,61633014the Research Fund for the Taishan Scholar Project of Shandong Province of China+1 种基金the SDUST Research Fund under Grant No.2015TDJH105the State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources under Grant No.LAPS16011
文摘The finite horizon H_2/H_∞ control problem of mean-field type for discrete-time systems is considered in this paper. Firstly, the authors derive a mean-field stochastic bounded real lemma(SBRL). Secondly, a sufficient condition for the solvability of discrete-time mean-field stochastic linearquadratic(LQ) optimal control is presented. Thirdly, based on SBRL and LQ results, this paper establishes a sufficient condition for the existence of discrete-time stochastic H_2/H_∞ control of meanfield type via the solvability of coupled matrix-valued equations.
基金supported by the Taishan Scholar Construction Engineering by Shandong Governmentthe National Natural Science Foundation of China under Grant Nos.61120106011,61203029,61104050+2 种基金the Natural Science Foundation of Shandong Province under Grant No.ZR2011FQ020the Research Fund for the Doctoral Program of Higher Education of China under Grant No.20120131120058the Scientific Research Foundation for Outstanding Young Scientists of Shandong Province under Grant No.BS2013DX008
文摘This paper studies the decentralized optimal control of discrete-time system with input delay,where a large number of agents with the identical decoupling dynamical equations and the coupling cost function through the mean field are considered.The decentralized and centralized optimal controllers are proposed by the optimal tracking control of LQR problem with delay.They are proved that the optimal controllers and the optimal cost function of the centralized and decentralized solutions are equivalent for the optimal control problem.An illustrative example is given to show the efficiency of the decentralized optimal controllers.
基金Project supported by the WCU World Class University program through the National Research Foundation of Korea funded by the Ministry of Education,Science and Technology (No. R31-20007)the Research Grants Council of HKSAR (No. PolyU 5001/11P)
文摘Mean field theory has raised a lot of interest in the recent years (see in particular the results of Lasry-Lions in 2006 and 2007,of Gueant-Lasry-Lions in 2011,of HuangCaines-Malham in 2007 and many others).There are a lot of applications.In general,the applications concern approximating an infinite number of players with common behavior by a representative agent.This agent has to solve a control problem perturbed by a field equation,representing in some way the behavior of the average infinite number of agents.This approach does not lead easily to the problems of Nash equilibrium for a finite number of players,perturbed by field equations,unless one considers averaging within different groups,which has not been done in the literature,and seems quite challenging.In this paper,the authors approach similar problems with a different motivation which makes sense for control and also for differential games.Thus the systems of nonlinear partial differential equations with mean field terms,which have not been addressed in the literature so far,are considered here.
基金supported by the National Natural Science Foundation of China under Grant Nos.11471051 and 11371362the Teaching Mode Reform Project of BUPT under Grant No.BUPT2015JY52+5 种基金supported by the National Natural Science Foundation of China under Grant No.11371029the Natural Science Foundation of Anhui Province under Grant No.1508085JGD10supported by the National Natural Science Foundation of China under Grant No.71373043the National Social Science Foundation of China under Grant No.14AZD121the Scientific Research Project Achievement of UIBE NetworkingCollaboration Center for China’s Multinational Business under Grant No.201502YY003A
文摘This paper considers the problem of partially observed optimal control for forward-backward stochastic systems driven by Brownian motions and an independent Poisson random measure with a feature that the cost functional is of mean-field type. When the coefficients of the system and the objective performance functionals are allowed to be random, possibly non-Markovian, Malliavin calculus is employed to derive a maximum principle for the optimal control of such a system where the adjoint process is explicitly expressed. The authors also investigate the mean-field type optimal control problem for the system driven by mean-field type forward-backward stochastic differential equations(FBSDEs in short) with jumps, where the coefficients contain not only the state process but also its expectation under partially observed information. The maximum principle is established using convex variational technique. An example is given to illustrate the obtained results.