针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题,本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景,构建了红、蓝两方无人机群的博弈对抗模...针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题,本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景,构建了红、蓝两方无人机群的博弈对抗模型,在多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)算法的Actor-Critic框架下,根据博弈环境的特点对原始的MADDPG算法进行改进。为了进一步提升算法对有效经验的探索和利用,本文构建了规则耦合模块以在无人机的决策过程中对Actor网络进行辅助。仿真实验表明,本文设计的算法在收敛速度、学习效率和稳定性方面都取了一定的提升,异构子网络的引入使算法更适用于无人机数量动态衰减的博弈场景;奖励势函数和重要性权重耦合的优先经验回放方法提升了经验差异的细化程度及优势经验利用率;规则耦合模块的引入实现了无人机决策网络对先验知识的有效利用。展开更多
The chess game provides a very rich experience in neighborhood types. The chess pieces have vertical, horizontal, diagonal, up/down or combined movements on one or many squares of the chess. These movements can associ...The chess game provides a very rich experience in neighborhood types. The chess pieces have vertical, horizontal, diagonal, up/down or combined movements on one or many squares of the chess. These movements can associate with neighborhoods. Our work aims to set a behavioral approximation between calculations carried out by means of traditional computation tools such as ordinary differential equations (ODEs) and the evolution of the value of the cells caused by the chess game moves. Our proposal is based on a grid. The cells’ value changes as time pass depending on both their neighborhood and an update rule. This framework succeeds in applying real data matching in the cases of the ODEs used in compartmental models of disease expansion, such as the well-known Susceptible-Infected Recovered (SIR) model and its derivatives, as well as in the case of population dynamics in competition for resources, depicted by the Lotke-Volterra model.展开更多
on the basis of linguistics, psychology and other related theory, we should carry outgame teaching method to make students learn easily and happily because of the lack of interestand other problems in some parts of ou...on the basis of linguistics, psychology and other related theory, we should carry outgame teaching method to make students learn easily and happily because of the lack of interestand other problems in some parts of our country.展开更多
文摘针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题,本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景,构建了红、蓝两方无人机群的博弈对抗模型,在多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)算法的Actor-Critic框架下,根据博弈环境的特点对原始的MADDPG算法进行改进。为了进一步提升算法对有效经验的探索和利用,本文构建了规则耦合模块以在无人机的决策过程中对Actor网络进行辅助。仿真实验表明,本文设计的算法在收敛速度、学习效率和稳定性方面都取了一定的提升,异构子网络的引入使算法更适用于无人机数量动态衰减的博弈场景;奖励势函数和重要性权重耦合的优先经验回放方法提升了经验差异的细化程度及优势经验利用率;规则耦合模块的引入实现了无人机决策网络对先验知识的有效利用。
文摘The chess game provides a very rich experience in neighborhood types. The chess pieces have vertical, horizontal, diagonal, up/down or combined movements on one or many squares of the chess. These movements can associate with neighborhoods. Our work aims to set a behavioral approximation between calculations carried out by means of traditional computation tools such as ordinary differential equations (ODEs) and the evolution of the value of the cells caused by the chess game moves. Our proposal is based on a grid. The cells’ value changes as time pass depending on both their neighborhood and an update rule. This framework succeeds in applying real data matching in the cases of the ODEs used in compartmental models of disease expansion, such as the well-known Susceptible-Infected Recovered (SIR) model and its derivatives, as well as in the case of population dynamics in competition for resources, depicted by the Lotke-Volterra model.
文摘on the basis of linguistics, psychology and other related theory, we should carry outgame teaching method to make students learn easily and happily because of the lack of interestand other problems in some parts of our country.