Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experi...Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experiences that hinder convergence,resulting in ineffective training performance for multi‐agent systems.To tackle this issue,a novel reinforcement learning scheme,Mutual Information Oriented Deep Skill Chaining(MioDSC),is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency.These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state.In addition,MioDSC can generate cooperative policies using the options framework,allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi‐agent learning.MioDSC was evaluated in the multi‐agent particle environment and the StarCraft multi‐agent challenge at varying difficulty levels.The experimental results demonstrate that MioDSC outperforms state‐of‐the‐art methods and is robust across various multi‐agent system tasks with high stability.展开更多
针对群组控制的机电一体化设备具有分布、并行工作和自主运行的特点,从优化调度、提高运行效能的角度出发,基于M u lti-agent理论和技术,提出了一种具有递阶结构的M u lti-agent智能调度模型,并对M u lti-agent系统结构、A gent协作机...针对群组控制的机电一体化设备具有分布、并行工作和自主运行的特点,从优化调度、提高运行效能的角度出发,基于M u lti-agent理论和技术,提出了一种具有递阶结构的M u lti-agent智能调度模型,并对M u lti-agent系统结构、A gent协作机制和控制算法等关键问题进行了研究。结合典型的电梯设备群控调度作了实例仿真,仿真结果说明该方法是可行的、正确的,具有十分明显的优越性。展开更多
提出了一种基于M u lti-A gen t的虚拟维修训练系统(VM TS)结构框架,整个系统分别由主控A gen t、仿真A gen t、和接口A gen t3个具有交互作用的A gen t组成,从而将虚拟维修训练系统的开发转化为一个多A gen t系统的设计与开发。基于多A...提出了一种基于M u lti-A gen t的虚拟维修训练系统(VM TS)结构框架,整个系统分别由主控A gen t、仿真A gen t、和接口A gen t3个具有交互作用的A gen t组成,从而将虚拟维修训练系统的开发转化为一个多A gen t系统的设计与开发。基于多A gen t的框架结构可实现受训者的智能模型及虚拟训练场景中虚拟物体的行为模型,从而可以提高VM TS的健壮性和可重用性。基于A gen t的概念模型实现了A gen t之间的交互和协作,并介绍了主控A gen t和仿真A gen t的具体实现方法。展开更多
基金National Natural Science Foundation of China,Grant/Award Number:61872171The Belt and Road Special Foundation of the State Key Laboratory of Hydrology‐Water Resources and Hydraulic Engineering,Grant/Award Number:2021490811。
文摘Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experiences that hinder convergence,resulting in ineffective training performance for multi‐agent systems.To tackle this issue,a novel reinforcement learning scheme,Mutual Information Oriented Deep Skill Chaining(MioDSC),is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency.These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state.In addition,MioDSC can generate cooperative policies using the options framework,allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi‐agent learning.MioDSC was evaluated in the multi‐agent particle environment and the StarCraft multi‐agent challenge at varying difficulty levels.The experimental results demonstrate that MioDSC outperforms state‐of‐the‐art methods and is robust across various multi‐agent system tasks with high stability.
文摘针对群组控制的机电一体化设备具有分布、并行工作和自主运行的特点,从优化调度、提高运行效能的角度出发,基于M u lti-agent理论和技术,提出了一种具有递阶结构的M u lti-agent智能调度模型,并对M u lti-agent系统结构、A gent协作机制和控制算法等关键问题进行了研究。结合典型的电梯设备群控调度作了实例仿真,仿真结果说明该方法是可行的、正确的,具有十分明显的优越性。
文摘提出了一种基于M u lti-A gen t的虚拟维修训练系统(VM TS)结构框架,整个系统分别由主控A gen t、仿真A gen t、和接口A gen t3个具有交互作用的A gen t组成,从而将虚拟维修训练系统的开发转化为一个多A gen t系统的设计与开发。基于多A gen t的框架结构可实现受训者的智能模型及虚拟训练场景中虚拟物体的行为模型,从而可以提高VM TS的健壮性和可重用性。基于A gen t的概念模型实现了A gen t之间的交互和协作,并介绍了主控A gen t和仿真A gen t的具体实现方法。