This paper studies the problem of time-varying formation control with finite-time prescribed performance for nonstrict feedback second-order multi-agent systems with unmeasured states and unknown nonlinearities.To eli...This paper studies the problem of time-varying formation control with finite-time prescribed performance for nonstrict feedback second-order multi-agent systems with unmeasured states and unknown nonlinearities.To eliminate nonlinearities,neural networks are applied to approximate the inherent dynamics of the system.In addition,due to the limitations of the actual working conditions,each follower agent can only obtain the locally measurable partial state information of the leader agent.To address this problem,a neural network state observer based on the leader state information is designed.Then,a finite-time prescribed performance adaptive output feedback control strategy is proposed by restricting the sliding mode surface to a prescribed region,which ensures that the closed-loop system has practical finite-time stability and that formation errors of the multi-agent systems converge to the prescribed performance bound in finite time.Finally,a numerical simulation is provided to demonstrate the practicality and effectiveness of the developed algorithm.展开更多
The emergence of beyond 5G networks has the potential for seamless and intelligent connectivity on a global scale.Network slicing is crucial in delivering services for different,demanding vertical applications in this...The emergence of beyond 5G networks has the potential for seamless and intelligent connectivity on a global scale.Network slicing is crucial in delivering services for different,demanding vertical applications in this context.Next-generation applications have time-sensitive requirements and depend on the most efficient routing path to ensure packets reach their intended destinations.However,the existing IP(Internet Protocol)over a multi-domain network faces challenges in enforcing network slicing due to minimal collaboration and information sharing among network operators.Conventional inter-domain routing methods,like Border Gateway Protocol(BGP),cannot make routing decisions based on performance,which frequently results in traffic flowing across congested paths that are never optimal.To address these issues,we propose CoopAI-Route,a multi-agent cooperative deep reinforcement learning(DRL)system utilizing hierarchical software-defined networks(SDN).This framework enforces network slicing in multi-domain networks and cooperative communication with various administrators to find performance-based routes in intra-and inter-domain.CoopAI-Route employs the Distributed Global Topology(DGT)algorithm to define inter-domain Quality of Service(QoS)paths.CoopAI-Route uses a DRL agent with a message-passing multi-agent Twin-Delayed Deep Deterministic Policy Gradient method to ensure optimal end-to-end routes adapted to the specific requirements of network slicing applications.Our evaluation demonstrates CoopAI-Route’s commendable performance in scalability,link failure handling,and adaptability to evolving topologies compared to state-of-the-art methods.展开更多
This paper is concerned with consensus of a secondorder linear time-invariant multi-agent system in the situation that there exists a communication delay among the agents in the network.A proportional-integral consens...This paper is concerned with consensus of a secondorder linear time-invariant multi-agent system in the situation that there exists a communication delay among the agents in the network.A proportional-integral consensus protocol is designed by using delayed and memorized state information.Under the proportional-integral consensus protocol,the consensus problem of the multi-agent system is transformed into the problem of asymptotic stability of the corresponding linear time-invariant time-delay system.Note that the location of the eigenvalues of the corresponding characteristic function of the linear time-invariant time-delay system not only determines the stability of the system,but also plays a critical role in the dynamic performance of the system.In this paper,based on recent results on the distribution of roots of quasi-polynomials,several necessary conditions for Hurwitz stability for a class of quasi-polynomials are first derived.Then allowable regions of consensus protocol parameters are estimated.Some necessary and sufficient conditions for determining effective protocol parameters are provided.The designed protocol can achieve consensus and improve the dynamic performance of the second-order multi-agent system.Moreover,the effects of delays on consensus of systems of harmonic oscillators/double integrators under proportional-integral consensus protocols are investigated.Furthermore,some results on proportional-integral consensus are derived for a class of high-order linear time-invariant multi-agent systems.展开更多
In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading...In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.展开更多
Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinfor...Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinforce-ment learning(DRL)theory and an improved Multi-Agent Deep Deterministic Policy Gradient(MADDPG-D2)algorithm with a dual experience replay pool and a dual noise based on multi-agent architecture is proposed to improve the efficiency of DTA.The algorithm is based on the traditional Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,and considers the introduction of a double noise mechanism to increase the action exploration space in the early stage of the algorithm,and the introduction of a double experience pool to improve the data utilization rate;at the same time,in order to accelerate the training speed and efficiency of the agents,and to solve the cold-start problem of the training,the a priori knowledge technology is applied to the training of the algorithm.Finally,the MADDPG-D2 algorithm is compared and analyzed based on the digital battlefield of ground and air confrontation.The experimental results show that the agents trained by the MADDPG-D2 algorithm have higher win rates and average rewards,can utilize the resources more reasonably,and better solve the problem of the traditional single agent algorithms facing the difficulty of solving the problem in the high-dimensional decision space.The MADDPG-D2 algorithm based on multi-agent architecture proposed in this paper has certain superiority and rationality in DTA.展开更多
Heat and mass transport through evaporation or drying processes occur in many applications such as food processing,pharmaceutical products,solar-driven vapor generation,textile design,and electronic cigarettes.In this...Heat and mass transport through evaporation or drying processes occur in many applications such as food processing,pharmaceutical products,solar-driven vapor generation,textile design,and electronic cigarettes.In this paper,the transport of water from a fresh potato considered as a wet porous media with laminar convective dry air fluid flow governed by Darcy’s law in two-dimensional is highlighted.Governing equations of mass conservation,momentumconservation,multiphase fluid flowin porousmedia,heat transfer,and transport of participating fluids and gases through evaporation from liquid to gaseous phase are solved simultaneously.In this model,the variable is block locations,the object function is changing water saturation inside the porous medium and the constraint is the constant mass of porous material.It shows that there is an optimal configuration for the purpose of water removal from the specimen.The results are compared with experimental and analyticalmethods Benchmark.Then for the purpose of configuration optimization,multi-agent reinforcement learning(MARL)is used while multiple porous blocks are considered as agents that transfer their moisture content with the environment in a real-world scenario.MARL has been tested and validated with previous conventional effective optimization simulations and its superiority proved.Our study examines and proposes an effective method for validating and testing multiagent reinforcement learning models and methods using a multiagent simulation.展开更多
Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation...Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation scenarios are explored in recreational cooperative augmented reality environments,as well as realworld scenarios in robotics.In this paper,we explore the realm of MARL and its potential applications in cooperative assignments.Our focus is on developing a multi-agent system that can collaborate to attack or defend against enemies and achieve victory withminimal damage.To accomplish this,we utilize the StarCraftMulti-Agent Challenge(SMAC)environment and train four MARL algorithms:Q-learning with Mixtures of Experts(QMIX),Value-DecompositionNetwork(VDN),Multi-agent Proximal PolicyOptimizer(MAPPO),andMulti-Agent Actor Attention Critic(MAA2C).These algorithms allow multiple agents to cooperate in a specific scenario to achieve the targeted mission.Our results show that the QMIX algorithm outperforms the other three algorithms in the attacking scenario,while the VDN algorithm achieves the best results in the defending scenario.Specifically,the VDNalgorithmreaches the highest value of battle wonmean and the lowest value of dead alliesmean.Our research demonstrates the potential forMARL algorithms to be used in real-world applications,such as controllingmultiple robots to provide helpful services or coordinating teams of agents to accomplish tasks that would be impossible for a human to do.The SMAC environment provides a unique opportunity to test and evaluate MARL algorithms in a challenging and dynamic environment,and our results show that these algorithms can be used to achieve victory with minimal damage.展开更多
A Multi-Agent System ( MAS ) is a promising approach to build complex system. This paper introduces the research of the Inner-Enterprise Credit Rating MAS ( IECRMAS). To raise the rating accuracy, we not only cons...A Multi-Agent System ( MAS ) is a promising approach to build complex system. This paper introduces the research of the Inner-Enterprise Credit Rating MAS ( IECRMAS). To raise the rating accuracy, we not only consider the rating-target's information, but also focus on the evaluators' feature information and propose the rational rating-group formation algorithm based on an anti-bias measurement of the group. We also propose the rational rating individual, which consists of the evaluator and the assistant rating agent. A rational group formation protocol is designed to coordinate autonomous agents to perform the rating job.展开更多
A multi-agent based manufacturing execution system (MES) model is presented. It is open, modula-rized, distributed, configurable, integratable and maintainable. By analyzing the MES domain in manufacturing systems, th...A multi-agent based manufacturing execution system (MES) model is presented. It is open, modula-rized, distributed, configurable, integratable and maintainable. By analyzing the MES domain in manufacturing systems, this paper proposes a multi-agent based MES model and analyzes the partitioned functions of MES in the model using unified modeling language (UML) diagrams, and establishes the ongoing implemented MES architecture. This MES can be facilely integrated with the enterprise resource planning (ERP), the floor control system (FCS), and the other manufacturing applications.展开更多
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers...Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses.展开更多
With the aid of multi-agent based modeling approach to complex systems, the hierarchy simulation models of carrier-based aircraft catapult launch are developed. Ocean, carrier, aircraft, and atmosphere are treated as ...With the aid of multi-agent based modeling approach to complex systems, the hierarchy simulation models of carrier-based aircraft catapult launch are developed. Ocean, carrier, aircraft, and atmosphere are treated as aggregation agents, the detailed components like catapult, landing gears, and disturbances are considered as meta-agents, which belong to their aggregation agent. Thus, the model with two layers is formed i.e. the aggregation agent layer and the meta-agent layer. The information communication among all agents is described. The meta-agents within one aggregation agent communicate with each other directly by information sharing, but the meta-agents, which belong to different aggregation agents exchange their information through the aggregation layer first, and then perceive it from the sharing environment, that is the aggregation agent. Thus, not only the hierarchy model is built, but also the environment perceived by each agent is specified. Meanwhile, the problem of balancing the independency of agent and the resource consumption brought by real-time communication within multi-agent system (MAS) is resolved. Each agent involved in carrier-based aircraft catapult launch is depicted, with considering the interaction within disturbed atmospheric environment and multiple motion bodies including carrier, aircraft, and landing gears. The models of reactive agents among them are derived based on tensors, and the perceived messages and inner frameworks of each agent are characterized. Finally, some results of a simulation instance are given. The simulation and modeling of dynamic system based on multi-agent system is of benefit to express physical concepts and logical hierarchy clearly and precisely. The system model can easily draw in kinds of other agents to achieve a precise simulation of more complex system. This modeling technique makes the complex integral dynamic equations of multibodies decompose into parallel operations of single agent, and it is convenient to expand, maintain, and reuse the program codes.展开更多
Consensus tracking control problems for single-integrator dynamics of multi-agent systems with switching topology are investigated. In order to design effective consensus tracking protocols for a more general class of...Consensus tracking control problems for single-integrator dynamics of multi-agent systems with switching topology are investigated. In order to design effective consensus tracking protocols for a more general class of networks, which are aimed at ensuring that the concerned states of agents converge to a constant or time-varying reference state, new consensus tracking protocols with a constant and time-varying reference state are proposed, respectively. Particularly, by contrast with spanning tree, an improved condition of switching interaction topology is presented. And then, convergence analysis of two consensus tracking protocols is provided by Lyapunov stability theory. Moreover, consensus tracking protocol with a time-varying reference state is extended to achieve the fbrmation control. By introducing formation structure set, each agent can gain its individual desired trajectory. Finally, several simulations are worked out to illustrate the effectiveness of theoretical results. The test results show that the states of agents can converge to a desired constant or time-varying reference state. In addition, by selecting appropriate structure set, agents can maintain the expected formation under random switching interaction topologies.展开更多
Consensus problems for discrete-time multi-agent systems were focused on. In order to design effective consensus protocols, which were aimed at ensuring that the concerned states of agents converged to a common value,...Consensus problems for discrete-time multi-agent systems were focused on. In order to design effective consensus protocols, which were aimed at ensuring that the concerned states of agents converged to a common value, a new consensus protocol for general discrete-time multi-agent system was proposed based on Lyapunov stability theory. For discrete-time multi-agent systems with desired trajectory, trajectory tracking and formation control problems were studied. The main idea of trajectory tracking problems was to design trajectory controller such that each agent tracked desired trajectory. For a type of formation problem with fixed formation structure, the formation structure set was introduced. According to the formation structure set, each agent can track its individual desired trajectory. Finally, simulations were provided to demonstrate the effectiveness of the theoretical results. The mlmerical results show that the states of agents converge to zero with consensus protocol, which is said to achieve a consensus asymptotically. In addition, through designing appropriate trajectory controllers, the simulation results show that agents converge to the desired trajectory asymptotically and can form different formations.展开更多
Formation control and obstacle avoidance for multi-agent systems have attracted more and more attention. In this paper, the problems of formation control and obstacle avoidance are investigated by means of a consensus...Formation control and obstacle avoidance for multi-agent systems have attracted more and more attention. In this paper, the problems of formation control and obstacle avoidance are investigated by means of a consensus algorithm. A novel distributed control model is proposed for the multi-agent system to form the anticipated formation as well as achieve obstacle avoidance. Based on the consensus algorithm, a distributed control function consisting of three terms (formation control term, velocity matching term, and obstacle avoidance term) is presented. By establishing a novel formation control matrix, a formation control term is constructed such that the agents can converge to consensus and reach the anticipated formation. A new obstacle avoidance function is developed by using the modified potential field approach to make sure that obstacle avoidance can be achieved whether the obstacle is in a dynamic state or a stationary state. A velocity matching term is also put forward to guarantee that the velocities of all agents converge to the same value. Furthermore, stability of the control model is proven. Simulation results are provided to demonstrate the effectiveness of the proposed control.展开更多
基金the National Natural Science Foundation of China(62203356)Fundamental Research Funds for the Central Universities of China(31020210502002)。
文摘This paper studies the problem of time-varying formation control with finite-time prescribed performance for nonstrict feedback second-order multi-agent systems with unmeasured states and unknown nonlinearities.To eliminate nonlinearities,neural networks are applied to approximate the inherent dynamics of the system.In addition,due to the limitations of the actual working conditions,each follower agent can only obtain the locally measurable partial state information of the leader agent.To address this problem,a neural network state observer based on the leader state information is designed.Then,a finite-time prescribed performance adaptive output feedback control strategy is proposed by restricting the sliding mode surface to a prescribed region,which ensures that the closed-loop system has practical finite-time stability and that formation errors of the multi-agent systems converge to the prescribed performance bound in finite time.Finally,a numerical simulation is provided to demonstrate the practicality and effectiveness of the developed algorithm.
文摘The emergence of beyond 5G networks has the potential for seamless and intelligent connectivity on a global scale.Network slicing is crucial in delivering services for different,demanding vertical applications in this context.Next-generation applications have time-sensitive requirements and depend on the most efficient routing path to ensure packets reach their intended destinations.However,the existing IP(Internet Protocol)over a multi-domain network faces challenges in enforcing network slicing due to minimal collaboration and information sharing among network operators.Conventional inter-domain routing methods,like Border Gateway Protocol(BGP),cannot make routing decisions based on performance,which frequently results in traffic flowing across congested paths that are never optimal.To address these issues,we propose CoopAI-Route,a multi-agent cooperative deep reinforcement learning(DRL)system utilizing hierarchical software-defined networks(SDN).This framework enforces network slicing in multi-domain networks and cooperative communication with various administrators to find performance-based routes in intra-and inter-domain.CoopAI-Route employs the Distributed Global Topology(DGT)algorithm to define inter-domain Quality of Service(QoS)paths.CoopAI-Route uses a DRL agent with a message-passing multi-agent Twin-Delayed Deep Deterministic Policy Gradient method to ensure optimal end-to-end routes adapted to the specific requirements of network slicing applications.Our evaluation demonstrates CoopAI-Route’s commendable performance in scalability,link failure handling,and adaptability to evolving topologies compared to state-of-the-art methods.
基金supported in part by the National Natural Science Foundation of China (NSFC)(61703086, 61773106)the IAPI Fundamental Research Funds (2018ZCX27)
文摘This paper is concerned with consensus of a secondorder linear time-invariant multi-agent system in the situation that there exists a communication delay among the agents in the network.A proportional-integral consensus protocol is designed by using delayed and memorized state information.Under the proportional-integral consensus protocol,the consensus problem of the multi-agent system is transformed into the problem of asymptotic stability of the corresponding linear time-invariant time-delay system.Note that the location of the eigenvalues of the corresponding characteristic function of the linear time-invariant time-delay system not only determines the stability of the system,but also plays a critical role in the dynamic performance of the system.In this paper,based on recent results on the distribution of roots of quasi-polynomials,several necessary conditions for Hurwitz stability for a class of quasi-polynomials are first derived.Then allowable regions of consensus protocol parameters are estimated.Some necessary and sufficient conditions for determining effective protocol parameters are provided.The designed protocol can achieve consensus and improve the dynamic performance of the second-order multi-agent system.Moreover,the effects of delays on consensus of systems of harmonic oscillators/double integrators under proportional-integral consensus protocols are investigated.Furthermore,some results on proportional-integral consensus are derived for a class of high-order linear time-invariant multi-agent systems.
基金This project was funded by Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah underGrant No.(IFPIP-1127-611-1443)the authors,therefore,acknowledge with thanks DSR technical and financial support.
文摘In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.
基金This research was funded by the Project of the National Natural Science Foundation of China,Grant Number 62106283.
文摘Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinforce-ment learning(DRL)theory and an improved Multi-Agent Deep Deterministic Policy Gradient(MADDPG-D2)algorithm with a dual experience replay pool and a dual noise based on multi-agent architecture is proposed to improve the efficiency of DTA.The algorithm is based on the traditional Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,and considers the introduction of a double noise mechanism to increase the action exploration space in the early stage of the algorithm,and the introduction of a double experience pool to improve the data utilization rate;at the same time,in order to accelerate the training speed and efficiency of the agents,and to solve the cold-start problem of the training,the a priori knowledge technology is applied to the training of the algorithm.Finally,the MADDPG-D2 algorithm is compared and analyzed based on the digital battlefield of ground and air confrontation.The experimental results show that the agents trained by the MADDPG-D2 algorithm have higher win rates and average rewards,can utilize the resources more reasonably,and better solve the problem of the traditional single agent algorithms facing the difficulty of solving the problem in the high-dimensional decision space.The MADDPG-D2 algorithm based on multi-agent architecture proposed in this paper has certain superiority and rationality in DTA.
文摘Heat and mass transport through evaporation or drying processes occur in many applications such as food processing,pharmaceutical products,solar-driven vapor generation,textile design,and electronic cigarettes.In this paper,the transport of water from a fresh potato considered as a wet porous media with laminar convective dry air fluid flow governed by Darcy’s law in two-dimensional is highlighted.Governing equations of mass conservation,momentumconservation,multiphase fluid flowin porousmedia,heat transfer,and transport of participating fluids and gases through evaporation from liquid to gaseous phase are solved simultaneously.In this model,the variable is block locations,the object function is changing water saturation inside the porous medium and the constraint is the constant mass of porous material.It shows that there is an optimal configuration for the purpose of water removal from the specimen.The results are compared with experimental and analyticalmethods Benchmark.Then for the purpose of configuration optimization,multi-agent reinforcement learning(MARL)is used while multiple porous blocks are considered as agents that transfer their moisture content with the environment in a real-world scenario.MARL has been tested and validated with previous conventional effective optimization simulations and its superiority proved.Our study examines and proposes an effective method for validating and testing multiagent reinforcement learning models and methods using a multiagent simulation.
基金supported in part by United States Air Force Research Institute for Tactical Autonomy(RITA)University Affiliated Research Center(UARC)in part by the United States Air Force Office of Scientific Research(AFOSR)Contract FA9550-22-1-0268 awarded to KHA,https://www.afrl.af.mil/AFOSR/The contract is entitled:“Investigating Improving Safety of Autonomous Exploring Intelligent Agents with Human-in-the-Loop Reinforcement Learning,”and in part by Jackson State University.
文摘Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation scenarios are explored in recreational cooperative augmented reality environments,as well as realworld scenarios in robotics.In this paper,we explore the realm of MARL and its potential applications in cooperative assignments.Our focus is on developing a multi-agent system that can collaborate to attack or defend against enemies and achieve victory withminimal damage.To accomplish this,we utilize the StarCraftMulti-Agent Challenge(SMAC)environment and train four MARL algorithms:Q-learning with Mixtures of Experts(QMIX),Value-DecompositionNetwork(VDN),Multi-agent Proximal PolicyOptimizer(MAPPO),andMulti-Agent Actor Attention Critic(MAA2C).These algorithms allow multiple agents to cooperate in a specific scenario to achieve the targeted mission.Our results show that the QMIX algorithm outperforms the other three algorithms in the attacking scenario,while the VDN algorithm achieves the best results in the defending scenario.Specifically,the VDNalgorithmreaches the highest value of battle wonmean and the lowest value of dead alliesmean.Our research demonstrates the potential forMARL algorithms to be used in real-world applications,such as controllingmultiple robots to provide helpful services or coordinating teams of agents to accomplish tasks that would be impossible for a human to do.The SMAC environment provides a unique opportunity to test and evaluate MARL algorithms in a challenging and dynamic environment,and our results show that these algorithms can be used to achieve victory with minimal damage.
基金This paper is supported by National Science Foundation of China under Grant No60542004
文摘A Multi-Agent System ( MAS ) is a promising approach to build complex system. This paper introduces the research of the Inner-Enterprise Credit Rating MAS ( IECRMAS). To raise the rating accuracy, we not only consider the rating-target's information, but also focus on the evaluators' feature information and propose the rational rating-group formation algorithm based on an anti-bias measurement of the group. We also propose the rational rating individual, which consists of the evaluator and the assistant rating agent. A rational group formation protocol is designed to coordinate autonomous agents to perform the rating job.
文摘A multi-agent based manufacturing execution system (MES) model is presented. It is open, modula-rized, distributed, configurable, integratable and maintainable. By analyzing the MES domain in manufacturing systems, this paper proposes a multi-agent based MES model and analyzes the partitioned functions of MES in the model using unified modeling language (UML) diagrams, and establishes the ongoing implemented MES architecture. This MES can be facilely integrated with the enterprise resource planning (ERP), the floor control system (FCS), and the other manufacturing applications.
基金supported in part by NSFC (62102099, U22A2054, 62101594)in part by the Pearl River Talent Recruitment Program (2021QN02S643)+9 种基金Guangzhou Basic Research Program (2023A04J1699)in part by the National Research Foundation, SingaporeInfocomm Media Development Authority under its Future Communications Research Development ProgrammeDSO National Laboratories under the AI Singapore Programme under AISG Award No AISG2-RP-2020-019Energy Research Test-Bed and Industry Partnership Funding Initiative, Energy Grid (EG) 2.0 programmeDesCartes and the Campus for Research Excellence and Technological Enterprise (CREATE) programmeMOE Tier 1 under Grant RG87/22in part by the Singapore University of Technology and Design (SUTD) (SRG-ISTD-2021- 165)in part by the SUTD-ZJU IDEA Grant SUTD-ZJU (VP) 202102in part by the Ministry of Education, Singapore, through its SUTD Kickstarter Initiative (SKI 20210204)。
文摘Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses.
基金Aeronautical Science Foundation of China (2006ZA51004)
文摘With the aid of multi-agent based modeling approach to complex systems, the hierarchy simulation models of carrier-based aircraft catapult launch are developed. Ocean, carrier, aircraft, and atmosphere are treated as aggregation agents, the detailed components like catapult, landing gears, and disturbances are considered as meta-agents, which belong to their aggregation agent. Thus, the model with two layers is formed i.e. the aggregation agent layer and the meta-agent layer. The information communication among all agents is described. The meta-agents within one aggregation agent communicate with each other directly by information sharing, but the meta-agents, which belong to different aggregation agents exchange their information through the aggregation layer first, and then perceive it from the sharing environment, that is the aggregation agent. Thus, not only the hierarchy model is built, but also the environment perceived by each agent is specified. Meanwhile, the problem of balancing the independency of agent and the resource consumption brought by real-time communication within multi-agent system (MAS) is resolved. Each agent involved in carrier-based aircraft catapult launch is depicted, with considering the interaction within disturbed atmospheric environment and multiple motion bodies including carrier, aircraft, and landing gears. The models of reactive agents among them are derived based on tensors, and the perceived messages and inner frameworks of each agent are characterized. Finally, some results of a simulation instance are given. The simulation and modeling of dynamic system based on multi-agent system is of benefit to express physical concepts and logical hierarchy clearly and precisely. The system model can easily draw in kinds of other agents to achieve a precise simulation of more complex system. This modeling technique makes the complex integral dynamic equations of multibodies decompose into parallel operations of single agent, and it is convenient to expand, maintain, and reuse the program codes.
基金Projects(61075065,60774045) supported by the National Natural Science Foundation of ChinaProject supported by the Graduate Degree Thesis Innovation Foundation of Central South University,China
文摘Consensus tracking control problems for single-integrator dynamics of multi-agent systems with switching topology are investigated. In order to design effective consensus tracking protocols for a more general class of networks, which are aimed at ensuring that the concerned states of agents converge to a constant or time-varying reference state, new consensus tracking protocols with a constant and time-varying reference state are proposed, respectively. Particularly, by contrast with spanning tree, an improved condition of switching interaction topology is presented. And then, convergence analysis of two consensus tracking protocols is provided by Lyapunov stability theory. Moreover, consensus tracking protocol with a time-varying reference state is extended to achieve the fbrmation control. By introducing formation structure set, each agent can gain its individual desired trajectory. Finally, several simulations are worked out to illustrate the effectiveness of theoretical results. The test results show that the states of agents can converge to a desired constant or time-varying reference state. In addition, by selecting appropriate structure set, agents can maintain the expected formation under random switching interaction topologies.
基金Projects(60474029,60774045,60604005) supported by the National Natural Science Foundation of ChinaProject supported by the Graduate Degree Thesis Innovation Foundation of Central South University,China
文摘Consensus problems for discrete-time multi-agent systems were focused on. In order to design effective consensus protocols, which were aimed at ensuring that the concerned states of agents converged to a common value, a new consensus protocol for general discrete-time multi-agent system was proposed based on Lyapunov stability theory. For discrete-time multi-agent systems with desired trajectory, trajectory tracking and formation control problems were studied. The main idea of trajectory tracking problems was to design trajectory controller such that each agent tracked desired trajectory. For a type of formation problem with fixed formation structure, the formation structure set was introduced. According to the formation structure set, each agent can track its individual desired trajectory. Finally, simulations were provided to demonstrate the effectiveness of the theoretical results. The mlmerical results show that the states of agents converge to zero with consensus protocol, which is said to achieve a consensus asymptotically. In addition, through designing appropriate trajectory controllers, the simulation results show that agents converge to the desired trajectory asymptotically and can form different formations.
基金supported by the National High Technology Research and Development Program of China(Grant No.2011AA040103)the Research Foundationof Shanghai Institute of Technology,China(Grant No.B504)
文摘Formation control and obstacle avoidance for multi-agent systems have attracted more and more attention. In this paper, the problems of formation control and obstacle avoidance are investigated by means of a consensus algorithm. A novel distributed control model is proposed for the multi-agent system to form the anticipated formation as well as achieve obstacle avoidance. Based on the consensus algorithm, a distributed control function consisting of three terms (formation control term, velocity matching term, and obstacle avoidance term) is presented. By establishing a novel formation control matrix, a formation control term is constructed such that the agents can converge to consensus and reach the anticipated formation. A new obstacle avoidance function is developed by using the modified potential field approach to make sure that obstacle avoidance can be achieved whether the obstacle is in a dynamic state or a stationary state. A velocity matching term is also put forward to guarantee that the velocities of all agents converge to the same value. Furthermore, stability of the control model is proven. Simulation results are provided to demonstrate the effectiveness of the proposed control.