期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
关于具有不可分离哈密顿量的平均场控制系统的策略迭代算法的研究
1
作者 唐毅雄 《应用数学进展》 2023年第5期2364-2375,共12页
平均场控制研究的是在大量智能体参与的系统中,由一个中心控制者计算系统的帕累托最优,且所有个体先验地使用完全相同的反馈控制。本文研究的对象是一类来自于平均场控制问题的倒向–正向偏微分方程组有限差分数值计算方法,使用的算法... 平均场控制研究的是在大量智能体参与的系统中,由一个中心控制者计算系统的帕累托最优,且所有个体先验地使用完全相同的反馈控制。本文研究的对象是一类来自于平均场控制问题的倒向–正向偏微分方程组有限差分数值计算方法,使用的算法基于最优控制问题中的策略迭代方法。本文讨论了较短时间区域内算法的收敛性和收敛速度。本文通过一些例子验证了算法的有效性。本文通过数值试验讨论了平均场博弈纳什均衡和平均场控制帕累托最优之间的总体效益比较。 展开更多
关键词 平均场控制 微分博弈 最优控制 数值计算方法
下载PDF
基于去噪概率扩散模型的平均场多智能体强化学习算法
2
作者 单国强 缪霏阳 +1 位作者 张子胤 李大鹏 《软件工程与应用》 2024年第5期704-719,共16页
为了解决基于平均场的多智能体强化学习(M3-UCRL)算法中的环境动力学模型对下一时刻状态预测不精确和策略学习样本过少的问题。本文利用了去噪概率扩散模型(Denoising Diffusion Probabilistic Models, DDPM)的数据生成能力,提出了一种... 为了解决基于平均场的多智能体强化学习(M3-UCRL)算法中的环境动力学模型对下一时刻状态预测不精确和策略学习样本过少的问题。本文利用了去噪概率扩散模型(Denoising Diffusion Probabilistic Models, DDPM)的数据生成能力,提出了一种基于DDPM的平均场多智能体强化学习(DDPM-M3RL)算法。该算法将环境模型的生成表述为去噪问题,利用DDPM算法,提高了环境模型对下一时刻状态预测的精确度,也为后续的策略学习提供了充足的样本数据,提高了策略模型的收敛速度。实验结果表明,该算法可以有效提高环境动力学模型对下一时刻状态预测的精确度,根据环境动力学模型生成的状态转移数据可以为策略学习提供充足的学习样本,有效提高了导航策略的性能和稳定性。To solve the problems of inaccurate prediction of the next state by the environment dynamics model and too few samples for policy learning in the mean field based multi-agent reinforcement learning (M3-UCRL) algorithm, this paper takes advantage of the data generation capability of denoising diffusion probability models (DDPM) and proposes a mean field multi-agent reinforcement learning (DDPM-M3RL) algorithm based on DDPM. The algorithm formulates the generation of the environment model as a denoising problem. By using the DDPM algorithm, the accuracy of the environment model’s prediction of the next state is improved, and sufficient sample data is provided for subsequent policy learning, which improves the convergence speed of the policy model. Experimental results show that the algorithm can effectively improve the accuracy of the environment dynamics model’s prediction of the next state, and the state transition data generated by the environment dynamics model can provide sufficient learning samples for policy learning, which effectively improves the performance and stability of the navigation strategy. 展开更多
关键词 多智能体强化学习 去噪概率扩散模型 平均场控制 策略学习
下载PDF
基于平均场内生奖励的多智能体强化学习算法
3
作者 孙文绮 李大鹏 +1 位作者 田峰 丁良辉 《无线电通信技术》 2023年第3期556-565,共10页
针对复杂的多智能体应用场景中只依靠根据最终目标设计的简单奖励函数无法对智能体学习策略做出有效引导的问题,提出了一种基于平均场内生奖励的多智能体强化学习(Model-based Multi-agent Mean-field Intrinsic Reward Upper Confidenc... 针对复杂的多智能体应用场景中只依靠根据最终目标设计的简单奖励函数无法对智能体学习策略做出有效引导的问题,提出了一种基于平均场内生奖励的多智能体强化学习(Model-based Multi-agent Mean-field Intrinsic Reward Upper Confidence Reinforcement Learning, M3IR-UCRL)算法。该算法在奖励函数中增加了内生奖励模块,用生成的内生奖励与定义任务的外部奖励一起帮助代表智能体在用平均场控制(Mean-Field Control, MFC)化简的多智能体系统中学习策略。智能体学习时首先按照期望累积内外奖励加权和的梯度方向更新策略参数,然后按照期望累积外部奖励的梯度方向更新内生奖励参数。仿真结果表明,相比于只用简单外部奖励引导智能体学习的(Model-based Multi-agent Mean-field Intrinsic Reward Upper Confidence Reinforcement Learning, M3-UCRL)算法,所提算法可以有效提高智能体在复杂的多智能体场景中的任务完成率,降低与周围环境的碰撞率,从而使算法的整体性能得到提升。 展开更多
关键词 多智能体系统 平均场控制 基于模型的强化学习 内生奖励
下载PDF
带有终端限制的平均场正倒向随机时滞控制系统的最大值原理以及在平均场对策中的应用
4
作者 郝涛 《数学年刊(A辑)》 CSCD 北大核心 2020年第3期331-356,共26页
研究带有时滞和终端状态限制的平均场正倒向随机控制系统的一个最优控制问题.驱动系统的系数依赖于解、解的时滞以及它们的分布.利用Lions导数,终端扰动方法以及Ekeland变分原则,得到了两种随机最大值原理.通过研究一个线性二次问题和... 研究带有时滞和终端状态限制的平均场正倒向随机控制系统的一个最优控制问题.驱动系统的系数依赖于解、解的时滞以及它们的分布.利用Lions导数,终端扰动方法以及Ekeland变分原则,得到了两种随机最大值原理.通过研究一个线性二次问题和一个生产-消费最优选取的平均场对策问题,对这一理论结果进行了阐述说明. 展开更多
关键词 平均正倒向时滞控制系统 随机最大值原理 终端扰动方法 Ekeland变分 平均对策
下载PDF
平均场线性二次最优控制问题离散反馈控制的收敛速度
5
作者 王燕青 《中国科学:数学》 CSCD 北大核心 2023年第8期1145-1162,共18页
本文就平均场系统的线性二次(linear quadratic,LQ)最优控制问题提出基于反馈控制的数值算法.首先,将原问题分解为两个子问题:随机系统的LQ问题和确定系统的LQ问题.其次,依次对两个子问题利用反馈控制策略进行离散化,并证明该离散的收... 本文就平均场系统的线性二次(linear quadratic,LQ)最优控制问题提出基于反馈控制的数值算法.首先,将原问题分解为两个子问题:随机系统的LQ问题和确定系统的LQ问题.其次,依次对两个子问题利用反馈控制策略进行离散化,并证明该离散的收敛速度.最后,给出数值例子来支撑理论结果. 展开更多
关键词 收敛速度 平均线性二次最优控制问题 闭环控制策略 RICCATI方程
原文传递
自洽法算复合材料有效性能重要公式的推导
6
作者 霍凯成 《武汉理工大学学报》 CAS CSCD 2001年第8期42-44,共3页
导出了文献 [4~ 6 ]中用自洽法计算弹性复合材料有效性能的修正的
关键词 复合材料 有效性能 自洽方法 Green公式 有效强性模量 平均控制方程
下载PDF
Discrete-Time Mean-Field Stochastic H_2/H_∞ Control 被引量:2
7
作者 ZHANG Weihai MA Limin ZHANG Tianliang 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2017年第4期765-781,共17页
The finite horizon H_2/H_∞ control problem of mean-field type for discrete-time systems is considered in this paper. Firstly, the authors derive a mean-field stochastic bounded real lemma(SBRL). Secondly, a sufficien... The finite horizon H_2/H_∞ control problem of mean-field type for discrete-time systems is considered in this paper. Firstly, the authors derive a mean-field stochastic bounded real lemma(SBRL). Secondly, a sufficient condition for the solvability of discrete-time mean-field stochastic linearquadratic(LQ) optimal control is presented. Thirdly, based on SBRL and LQ results, this paper establishes a sufficient condition for the existence of discrete-time stochastic H_2/H_∞ control of meanfield type via the solvability of coupled matrix-valued equations. 展开更多
关键词 Discrete-time systems H2/H∞ control mean-field.
原文传递
Decentralized Control of Discrete-Time System with Delay in Mean Field LQR Problem
8
作者 ZHANG Fangfang WANG Wei 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2015年第4期755-772,共18页
This paper studies the decentralized optimal control of discrete-time system with input delay,where a large number of agents with the identical decoupling dynamical equations and the coupling cost function through the... This paper studies the decentralized optimal control of discrete-time system with input delay,where a large number of agents with the identical decoupling dynamical equations and the coupling cost function through the mean field are considered.The decentralized and centralized optimal controllers are proposed by the optimal tracking control of LQR problem with delay.They are proved that the optimal controllers and the optimal cost function of the centralized and decentralized solutions are equivalent for the optimal control problem.An illustrative example is given to show the efficiency of the decentralized optimal controllers. 展开更多
关键词 Decentralized control discrete-time system input delay mean field.
原文传递
Control and Nash Games with Mean Field Effect
9
作者 Alain BENSOUSSAN Jens FREHSE 《Chinese Annals of Mathematics,Series B》 SCIE CSCD 2013年第2期161-192,共32页
Mean field theory has raised a lot of interest in the recent years (see in particular the results of Lasry-Lions in 2006 and 2007,of Gueant-Lasry-Lions in 2011,of HuangCaines-Malham in 2007 and many others).There are ... Mean field theory has raised a lot of interest in the recent years (see in particular the results of Lasry-Lions in 2006 and 2007,of Gueant-Lasry-Lions in 2011,of HuangCaines-Malham in 2007 and many others).There are a lot of applications.In general,the applications concern approximating an infinite number of players with common behavior by a representative agent.This agent has to solve a control problem perturbed by a field equation,representing in some way the behavior of the average infinite number of agents.This approach does not lead easily to the problems of Nash equilibrium for a finite number of players,perturbed by field equations,unless one considers averaging within different groups,which has not been done in the literature,and seems quite challenging.In this paper,the authors approach similar problems with a different motivation which makes sense for control and also for differential games.Thus the systems of nonlinear partial differential equations with mean field terms,which have not been addressed in the literature so far,are considered here. 展开更多
关键词 Mean field Dynamic programming Nash games EQUILIBRIUM Calculus of variations
原文传递
On Optimal Mean-Field Control Problem of Mean-Field Forward-Backward Stochastic System with Jumps Under Partial Information
10
作者 ZHOU Qing REN Yong WU Weixing 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2017年第4期828-856,共29页
This paper considers the problem of partially observed optimal control for forward-backward stochastic systems driven by Brownian motions and an independent Poisson random measure with a feature that the cost function... This paper considers the problem of partially observed optimal control for forward-backward stochastic systems driven by Brownian motions and an independent Poisson random measure with a feature that the cost functional is of mean-field type. When the coefficients of the system and the objective performance functionals are allowed to be random, possibly non-Markovian, Malliavin calculus is employed to derive a maximum principle for the optimal control of such a system where the adjoint process is explicitly expressed. The authors also investigate the mean-field type optimal control problem for the system driven by mean-field type forward-backward stochastic differential equations(FBSDEs in short) with jumps, where the coefficients contain not only the state process but also its expectation under partially observed information. The maximum principle is established using convex variational technique. An example is given to illustrate the obtained results. 展开更多
关键词 Forward-backward stochastic differential equation Girsanov's theorem jump diffusion Malliavin calculus maximum principle mean-field type partial information.
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部