Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers...Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses.展开更多
Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that ...Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms.展开更多
Large-scale indoor 3D reconstruction with multiple robots faces challenges in core enabling technologies.This work contributes to a framework addressing localization,coordination,and vision processing for multi-agent ...Large-scale indoor 3D reconstruction with multiple robots faces challenges in core enabling technologies.This work contributes to a framework addressing localization,coordination,and vision processing for multi-agent reconstruction.A system architecture fusing visible light positioning,multi-agent path finding via reinforcement learning,and 360°camera techniques for 3D reconstruction is proposed.Our visible light positioning algorithm leverages existing lighting for centimeter-level localization without additional infrastructure.Meanwhile,a decentralized reinforcement learning approach is developed to solve the multi-agent path finding problem,with communications among agents optimized.Our 3D reconstruction pipeline utilizes equirectangular projection from 360°cameras to facilitate depth-independent reconstruction from posed monocular images using neural networks.Experimental validation demonstrates centimeter-level indoor navigation and 3D scene reconstruction capabilities of our framework.The challenges and limitations stemming from the above enabling technologies are discussed at the end of each corresponding section.In summary,this research advances fundamental techniques for multi-robot indoor 3D modeling,contributing to automated,data-driven applications through coordinated robot navigation,perception,and modeling.展开更多
Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and di...Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and discuss related survey works.Then,we review the existing works addressing inherent challenges and those focusing on diverse applications.Some representative stochastic games,MARL means,spatial forms of MARL,and task classification are revisited.We then conduct an in-depth exploration of a variety of challenges encountered in MARL applications.We also address critical operational aspects,such as hyperparameter tuning and computational complexity,which are pivotal in practical implementations of MARL.Afterward,we make a thorough overview of the applications of MARL to intelligent machines and devices,chemical engineering,biotechnology,healthcare,and societal issues,which highlights the extensive potential and relevance of MARL within both current and future technological contexts.Our survey also encompasses a detailed examination of benchmark environments used in MARL research,which are instrumental in evaluating MARL algorithms and demonstrate the adaptability of MARL to diverse application scenarios.In the end,we give our prospect for MARL and discuss their related techniques and potential future applications.展开更多
The emergence of beyond 5G networks has the potential for seamless and intelligent connectivity on a global scale.Network slicing is crucial in delivering services for different,demanding vertical applications in this...The emergence of beyond 5G networks has the potential for seamless and intelligent connectivity on a global scale.Network slicing is crucial in delivering services for different,demanding vertical applications in this context.Next-generation applications have time-sensitive requirements and depend on the most efficient routing path to ensure packets reach their intended destinations.However,the existing IP(Internet Protocol)over a multi-domain network faces challenges in enforcing network slicing due to minimal collaboration and information sharing among network operators.Conventional inter-domain routing methods,like Border Gateway Protocol(BGP),cannot make routing decisions based on performance,which frequently results in traffic flowing across congested paths that are never optimal.To address these issues,we propose CoopAI-Route,a multi-agent cooperative deep reinforcement learning(DRL)system utilizing hierarchical software-defined networks(SDN).This framework enforces network slicing in multi-domain networks and cooperative communication with various administrators to find performance-based routes in intra-and inter-domain.CoopAI-Route employs the Distributed Global Topology(DGT)algorithm to define inter-domain Quality of Service(QoS)paths.CoopAI-Route uses a DRL agent with a message-passing multi-agent Twin-Delayed Deep Deterministic Policy Gradient method to ensure optimal end-to-end routes adapted to the specific requirements of network slicing applications.Our evaluation demonstrates CoopAI-Route’s commendable performance in scalability,link failure handling,and adaptability to evolving topologies compared to state-of-the-art methods.展开更多
Blockchain can realize the reliable storage of a large amount of data that is chronologically related and verifiable within the system.This technology has been widely used and has developed rapidly in big data systems...Blockchain can realize the reliable storage of a large amount of data that is chronologically related and verifiable within the system.This technology has been widely used and has developed rapidly in big data systems across various fields.An increasing number of users are participating in application systems that use blockchain as their underlying architecture.As the number of transactions and the capital involved in blockchain grow,ensuring information security becomes imperative.Addressing the verification of transactional information security and privacy has emerged as a critical challenge.Blockchain-based verification methods can effectively eliminate the need for centralized third-party organizations.However,the efficiency of nodes in storing and verifying blockchain data faces unprecedented challenges.To address this issue,this paper introduces an efficient verification scheme for transaction security.Initially,it presents a node evaluation module to estimate the activity level of user nodes participating in transactions,accompanied by a probabilistic analysis for all transactions.Subsequently,this paper optimizes the conventional transaction organization form,introduces a heterogeneous Merkle tree storage structure,and designs algorithms for constructing these heterogeneous trees.Theoretical analyses and simulation experiments conclusively demonstrate the superior performance of this scheme.When verifying the same number of transactions,the heterogeneous Merkle tree transmits less data and is more efficient than traditional methods.The findings indicate that the heterogeneous Merkle tree structure is suitable for various blockchain applications,including the Internet of Things.This scheme can markedly enhance the efficiency of information verification and bolster the security of distributed systems.展开更多
This paper examines the bipartite consensus problems for the nonlinear multi-agent systems in Lurie dynamics form with cooperative and competitive communication between different agents. Based on the contraction theor...This paper examines the bipartite consensus problems for the nonlinear multi-agent systems in Lurie dynamics form with cooperative and competitive communication between different agents. Based on the contraction theory, some new conditions for the nonlinear Lurie multi-agent systems reaching bipartite leaderless consensus and bipartite tracking consensus are presented. Compared with the traditional methods, this approach degrades the dimensions of the conditions, eliminates some restrictions of the system matrix, and extends the range of the nonlinear function. Finally, two numerical examples are provided to illustrate the efficiency of our results.展开更多
This paper investigates the problem of global/semi-global finite-time consensus for integrator-type multi-agent sys-tems.New hyperbolic tangent function-based protocols are pro-posed to achieve global and semi-global ...This paper investigates the problem of global/semi-global finite-time consensus for integrator-type multi-agent sys-tems.New hyperbolic tangent function-based protocols are pro-posed to achieve global and semi-global finite-time consensus for both single-integrator and double-integrator multi-agent systems with leaderless undirected and leader-following directed commu-nication topologies.These new protocols not only provide an explicit upper-bound estimate for the settling time,but also have a user-prescribed bounded control level.In addition,compared to some existing results based on the saturation function,the pro-posed approach considerably simplifies the protocol design and the stability analysis.Illustrative examples and an application demonstrate the effectiveness of the proposed protocols.展开更多
This paper is concerned with consensus of a secondorder linear time-invariant multi-agent system in the situation that there exists a communication delay among the agents in the network.A proportional-integral consens...This paper is concerned with consensus of a secondorder linear time-invariant multi-agent system in the situation that there exists a communication delay among the agents in the network.A proportional-integral consensus protocol is designed by using delayed and memorized state information.Under the proportional-integral consensus protocol,the consensus problem of the multi-agent system is transformed into the problem of asymptotic stability of the corresponding linear time-invariant time-delay system.Note that the location of the eigenvalues of the corresponding characteristic function of the linear time-invariant time-delay system not only determines the stability of the system,but also plays a critical role in the dynamic performance of the system.In this paper,based on recent results on the distribution of roots of quasi-polynomials,several necessary conditions for Hurwitz stability for a class of quasi-polynomials are first derived.Then allowable regions of consensus protocol parameters are estimated.Some necessary and sufficient conditions for determining effective protocol parameters are provided.The designed protocol can achieve consensus and improve the dynamic performance of the second-order multi-agent system.Moreover,the effects of delays on consensus of systems of harmonic oscillators/double integrators under proportional-integral consensus protocols are investigated.Furthermore,some results on proportional-integral consensus are derived for a class of high-order linear time-invariant multi-agent systems.展开更多
The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-ma...The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.展开更多
In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading...In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.展开更多
As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication ...As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents.展开更多
This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature i...This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature in modern power grids.To tackle the unique challenges of voltage control in distributed renewable energy networks,researchers are increasingly turning towards multi-agent reinforcement learning(MARL).However,MARL raises safety concerns due to the unpredictability in agent actions during their exploration phase.This unpredictability can lead to unsafe control measures.To mitigate these safety concerns in MARL-based voltage control,our study introduces a novel approach:Safety-ConstrainedMulti-Agent Reinforcement Learning(SC-MARL).This approach incorporates a specialized safety constraint module specifically designed for voltage control within the MARL framework.This module ensures that the MARL agents carry out voltage control actions safely.The experiments demonstrate that,in the 33-buses,141-buses,and 322-buses power systems,employing SC-MARL for voltage control resulted in a reduction of the Voltage Out of Control Rate(%V.out)from0.43,0.24,and 2.95 to 0,0.01,and 0.03,respectively.Additionally,the Reactive Power Loss(Q loss)decreased from 0.095,0.547,and 0.017 to 0.062,0.452,and 0.016 in the corresponding systems.展开更多
This paper studies the problem of time-varying formation control with finite-time prescribed performance for nonstrict feedback second-order multi-agent systems with unmeasured states and unknown nonlinearities.To eli...This paper studies the problem of time-varying formation control with finite-time prescribed performance for nonstrict feedback second-order multi-agent systems with unmeasured states and unknown nonlinearities.To eliminate nonlinearities,neural networks are applied to approximate the inherent dynamics of the system.In addition,due to the limitations of the actual working conditions,each follower agent can only obtain the locally measurable partial state information of the leader agent.To address this problem,a neural network state observer based on the leader state information is designed.Then,a finite-time prescribed performance adaptive output feedback control strategy is proposed by restricting the sliding mode surface to a prescribed region,which ensures that the closed-loop system has practical finite-time stability and that formation errors of the multi-agent systems converge to the prescribed performance bound in finite time.Finally,a numerical simulation is provided to demonstrate the practicality and effectiveness of the developed algorithm.展开更多
Wind-photovoltaic(PV)-hydrogen-storage multi-agent energy systems are expected to play an important role in promoting renewable power utilization and decarbonization.In this study,a coordinated operation method was pr...Wind-photovoltaic(PV)-hydrogen-storage multi-agent energy systems are expected to play an important role in promoting renewable power utilization and decarbonization.In this study,a coordinated operation method was proposed for a wind-PVhydrogen-storage multi-agent energy system.First,a coordinated operation model was formulated for each agent considering peer-to-peer power trading.Second,a coordinated operation interactive framework for a multi-agent energy system was proposed based on the theory of the alternating direction method of multipliers.Third,a distributed interactive algorithm was proposed to protect the privacy of each agent and solve coordinated operation strategies.Finally,the effectiveness of the proposed coordinated operation method was tested on multi-agent energy systems with different structures,and the operational revenues of the wind power,PV,hydrogen,and energy storage agents of the proposed coordinated operation model were improved by approximately 59.19%,233.28%,16.75%,and 145.56%,respectively,compared with the independent operation model.展开更多
Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(S...Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%.展开更多
Battery energy storage systems(BESSs)are widely used in smart grids.However,power consumed by inner impedance and the capacity degradation of each battery unit become particularly severe,which has resulted in an incre...Battery energy storage systems(BESSs)are widely used in smart grids.However,power consumed by inner impedance and the capacity degradation of each battery unit become particularly severe,which has resulted in an increase in operating costs.The general economic dispatch(ED)algorithm based on marginal cost(MC)consensus is usually a proportional(P)controller,which encounters the defects of slow convergence speed and low control accuracy.In order to solve the distributed ED problem of the isolated BESS network with excellent dynamic and steady-state performance,we attempt to design a proportional integral(PI)controller with a reset mechanism(PI+R)to asymptotically promote MC consensus and total power mismatch towards 0 in this paper.To be frank,the integral term in the PI controller is reset to 0 at an appropriate time when the proportional term undergoes a zero crossing,which accelerates convergence,improves control accuracy,and avoids overshoot.The eigenvalues of the system under a PI+R controller is well analyzed,ensuring the regularity of the system and enabling the reset mechanism.To ensure supply and demand balance within the isolated BESSs,a centralized reset mechanism is introduced,so that the controller is distributed in a flow set and centralized in a jump set.To cope with Zeno behavior and input delay,a dwell time that the system resides in a flow set is given.Based on this,the system with input delays can be reduced to a time-delay free system.Considering the capacity limitation of the battery,a modified MC scheme with PI+R controller is designed.The correctness of the designed scheme is verified through relevant simulations.展开更多
With the introduction of the“dual carbon”goal and the continuous promotion of low-carbon development,the integrated energy system(IES)has gradually become an effective way to save energy and reduce emissions.This st...With the introduction of the“dual carbon”goal and the continuous promotion of low-carbon development,the integrated energy system(IES)has gradually become an effective way to save energy and reduce emissions.This study proposes a low-carbon economic optimization scheduling model for an IES that considers carbon trading costs.With the goal of minimizing the total operating cost of the IES and considering the transferable and curtailable characteristics of the electric and thermal flexible loads,an optimal scheduling model of the IES that considers the cost of carbon trading and flexible loads on the user side was established.The role of flexible loads in improving the economy of an energy system was investigated using examples,and the rationality and effectiveness of the study were verified through a comparative analysis of different scenarios.The results showed that the total cost of the system in different scenarios was reduced by 18.04%,9.1%,3.35%,and 7.03%,respectively,whereas the total carbon emissions of the system were reduced by 65.28%,20.63%,3.85%,and 18.03%,respectively,when the carbon trading cost and demand-side flexible electric and thermal load responses were considered simultaneously.Flexible electrical and thermal loads did not have the same impact on the system performance.In the analyzed case,the total cost and carbon emissions of the system when only the flexible electrical load response was considered were lower than those when only the flexible thermal load response was taken into account.Photovoltaics have an excess of carbon trading credits and can profit from selling them,whereas other devices have an excess of carbon trading and need to buy carbon credits.展开更多
基金supported in part by NSFC (62102099, U22A2054, 62101594)in part by the Pearl River Talent Recruitment Program (2021QN02S643)+9 种基金Guangzhou Basic Research Program (2023A04J1699)in part by the National Research Foundation, SingaporeInfocomm Media Development Authority under its Future Communications Research Development ProgrammeDSO National Laboratories under the AI Singapore Programme under AISG Award No AISG2-RP-2020-019Energy Research Test-Bed and Industry Partnership Funding Initiative, Energy Grid (EG) 2.0 programmeDesCartes and the Campus for Research Excellence and Technological Enterprise (CREATE) programmeMOE Tier 1 under Grant RG87/22in part by the Singapore University of Technology and Design (SUTD) (SRG-ISTD-2021- 165)in part by the SUTD-ZJU IDEA Grant SUTD-ZJU (VP) 202102in part by the Ministry of Education, Singapore, through its SUTD Kickstarter Initiative (SKI 20210204)。
文摘Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses.
基金supported in part by the National Natural Science Foundation of China (62136008,62236002,61921004,62173251,62103104)the “Zhishan” Scholars Programs of Southeast Universitythe Fundamental Research Funds for the Central Universities (2242023K30034)。
文摘Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms.
基金supported by Bright Dream Robotics and the HKUSTBDR Joint Research Institute Funding Scheme under Project HBJRI-FTP-005(Automated 3D Reconstruction using Robot-mounted 360-Degree Camera with Visible Light Positioning Technology for Building Information Modelling Applications,OKT22EG06).
文摘Large-scale indoor 3D reconstruction with multiple robots faces challenges in core enabling technologies.This work contributes to a framework addressing localization,coordination,and vision processing for multi-agent reconstruction.A system architecture fusing visible light positioning,multi-agent path finding via reinforcement learning,and 360°camera techniques for 3D reconstruction is proposed.Our visible light positioning algorithm leverages existing lighting for centimeter-level localization without additional infrastructure.Meanwhile,a decentralized reinforcement learning approach is developed to solve the multi-agent path finding problem,with communications among agents optimized.Our 3D reconstruction pipeline utilizes equirectangular projection from 360°cameras to facilitate depth-independent reconstruction from posed monocular images using neural networks.Experimental validation demonstrates centimeter-level indoor navigation and 3D scene reconstruction capabilities of our framework.The challenges and limitations stemming from the above enabling technologies are discussed at the end of each corresponding section.In summary,this research advances fundamental techniques for multi-robot indoor 3D modeling,contributing to automated,data-driven applications through coordinated robot navigation,perception,and modeling.
基金Ministry of Education,Singapore,under AcRF TIER 1 Grant RG64/23the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship,a Schmidt Futures program,USA.
文摘Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and discuss related survey works.Then,we review the existing works addressing inherent challenges and those focusing on diverse applications.Some representative stochastic games,MARL means,spatial forms of MARL,and task classification are revisited.We then conduct an in-depth exploration of a variety of challenges encountered in MARL applications.We also address critical operational aspects,such as hyperparameter tuning and computational complexity,which are pivotal in practical implementations of MARL.Afterward,we make a thorough overview of the applications of MARL to intelligent machines and devices,chemical engineering,biotechnology,healthcare,and societal issues,which highlights the extensive potential and relevance of MARL within both current and future technological contexts.Our survey also encompasses a detailed examination of benchmark environments used in MARL research,which are instrumental in evaluating MARL algorithms and demonstrate the adaptability of MARL to diverse application scenarios.In the end,we give our prospect for MARL and discuss their related techniques and potential future applications.
文摘The emergence of beyond 5G networks has the potential for seamless and intelligent connectivity on a global scale.Network slicing is crucial in delivering services for different,demanding vertical applications in this context.Next-generation applications have time-sensitive requirements and depend on the most efficient routing path to ensure packets reach their intended destinations.However,the existing IP(Internet Protocol)over a multi-domain network faces challenges in enforcing network slicing due to minimal collaboration and information sharing among network operators.Conventional inter-domain routing methods,like Border Gateway Protocol(BGP),cannot make routing decisions based on performance,which frequently results in traffic flowing across congested paths that are never optimal.To address these issues,we propose CoopAI-Route,a multi-agent cooperative deep reinforcement learning(DRL)system utilizing hierarchical software-defined networks(SDN).This framework enforces network slicing in multi-domain networks and cooperative communication with various administrators to find performance-based routes in intra-and inter-domain.CoopAI-Route employs the Distributed Global Topology(DGT)algorithm to define inter-domain Quality of Service(QoS)paths.CoopAI-Route uses a DRL agent with a message-passing multi-agent Twin-Delayed Deep Deterministic Policy Gradient method to ensure optimal end-to-end routes adapted to the specific requirements of network slicing applications.Our evaluation demonstrates CoopAI-Route’s commendable performance in scalability,link failure handling,and adaptability to evolving topologies compared to state-of-the-art methods.
基金funded by the National Natural Science Foundation of China(62072056,62172058)the Researchers Supporting Project Number(RSP2023R102)King Saud University,Riyadh,Saudi Arabia+4 种基金funded by the Hunan Provincial Key Research and Development Program(2022SK2107,2022GK2019)the Natural Science Foundation of Hunan Province(2023JJ30054)the Foundation of State Key Laboratory of Public Big Data(PBD2021-15)the Young Doctor Innovation Program of Zhejiang Shuren University(2019QC30)Postgraduate Scientific Research Innovation Project of Hunan Province(CX20220940,CX20220941).
文摘Blockchain can realize the reliable storage of a large amount of data that is chronologically related and verifiable within the system.This technology has been widely used and has developed rapidly in big data systems across various fields.An increasing number of users are participating in application systems that use blockchain as their underlying architecture.As the number of transactions and the capital involved in blockchain grow,ensuring information security becomes imperative.Addressing the verification of transactional information security and privacy has emerged as a critical challenge.Blockchain-based verification methods can effectively eliminate the need for centralized third-party organizations.However,the efficiency of nodes in storing and verifying blockchain data faces unprecedented challenges.To address this issue,this paper introduces an efficient verification scheme for transaction security.Initially,it presents a node evaluation module to estimate the activity level of user nodes participating in transactions,accompanied by a probabilistic analysis for all transactions.Subsequently,this paper optimizes the conventional transaction organization form,introduces a heterogeneous Merkle tree storage structure,and designs algorithms for constructing these heterogeneous trees.Theoretical analyses and simulation experiments conclusively demonstrate the superior performance of this scheme.When verifying the same number of transactions,the heterogeneous Merkle tree transmits less data and is more efficient than traditional methods.The findings indicate that the heterogeneous Merkle tree structure is suitable for various blockchain applications,including the Internet of Things.This scheme can markedly enhance the efficiency of information verification and bolster the security of distributed systems.
基金Project supported by the National Natural Science Foundation of China(Grant No.62363005)the Jiangxi Provincial Natural Science Foundation(Grant Nos.20161BAB212032 and 20232BAB202034)the Science and Technology Research Project of Jiangxi Provincial Department of Education(Grant Nos.GJJ202602 and GJJ202601)。
文摘This paper examines the bipartite consensus problems for the nonlinear multi-agent systems in Lurie dynamics form with cooperative and competitive communication between different agents. Based on the contraction theory, some new conditions for the nonlinear Lurie multi-agent systems reaching bipartite leaderless consensus and bipartite tracking consensus are presented. Compared with the traditional methods, this approach degrades the dimensions of the conditions, eliminates some restrictions of the system matrix, and extends the range of the nonlinear function. Finally, two numerical examples are provided to illustrate the efficiency of our results.
基金supported by the National Natural Science Foundation of China(62073019)。
文摘This paper investigates the problem of global/semi-global finite-time consensus for integrator-type multi-agent sys-tems.New hyperbolic tangent function-based protocols are pro-posed to achieve global and semi-global finite-time consensus for both single-integrator and double-integrator multi-agent systems with leaderless undirected and leader-following directed commu-nication topologies.These new protocols not only provide an explicit upper-bound estimate for the settling time,but also have a user-prescribed bounded control level.In addition,compared to some existing results based on the saturation function,the pro-posed approach considerably simplifies the protocol design and the stability analysis.Illustrative examples and an application demonstrate the effectiveness of the proposed protocols.
基金supported in part by the National Natural Science Foundation of China (NSFC)(61703086, 61773106)the IAPI Fundamental Research Funds (2018ZCX27)
文摘This paper is concerned with consensus of a secondorder linear time-invariant multi-agent system in the situation that there exists a communication delay among the agents in the network.A proportional-integral consensus protocol is designed by using delayed and memorized state information.Under the proportional-integral consensus protocol,the consensus problem of the multi-agent system is transformed into the problem of asymptotic stability of the corresponding linear time-invariant time-delay system.Note that the location of the eigenvalues of the corresponding characteristic function of the linear time-invariant time-delay system not only determines the stability of the system,but also plays a critical role in the dynamic performance of the system.In this paper,based on recent results on the distribution of roots of quasi-polynomials,several necessary conditions for Hurwitz stability for a class of quasi-polynomials are first derived.Then allowable regions of consensus protocol parameters are estimated.Some necessary and sufficient conditions for determining effective protocol parameters are provided.The designed protocol can achieve consensus and improve the dynamic performance of the second-order multi-agent system.Moreover,the effects of delays on consensus of systems of harmonic oscillators/double integrators under proportional-integral consensus protocols are investigated.Furthermore,some results on proportional-integral consensus are derived for a class of high-order linear time-invariant multi-agent systems.
文摘The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.
基金This project was funded by Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah underGrant No.(IFPIP-1127-611-1443)the authors,therefore,acknowledge with thanks DSR technical and financial support.
文摘In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.
文摘As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents.
基金“Regional Innovation Strategy(RIS)”through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(MOE)(2021RIS-002).
文摘This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature in modern power grids.To tackle the unique challenges of voltage control in distributed renewable energy networks,researchers are increasingly turning towards multi-agent reinforcement learning(MARL).However,MARL raises safety concerns due to the unpredictability in agent actions during their exploration phase.This unpredictability can lead to unsafe control measures.To mitigate these safety concerns in MARL-based voltage control,our study introduces a novel approach:Safety-ConstrainedMulti-Agent Reinforcement Learning(SC-MARL).This approach incorporates a specialized safety constraint module specifically designed for voltage control within the MARL framework.This module ensures that the MARL agents carry out voltage control actions safely.The experiments demonstrate that,in the 33-buses,141-buses,and 322-buses power systems,employing SC-MARL for voltage control resulted in a reduction of the Voltage Out of Control Rate(%V.out)from0.43,0.24,and 2.95 to 0,0.01,and 0.03,respectively.Additionally,the Reactive Power Loss(Q loss)decreased from 0.095,0.547,and 0.017 to 0.062,0.452,and 0.016 in the corresponding systems.
基金the National Natural Science Foundation of China(62203356)Fundamental Research Funds for the Central Universities of China(31020210502002)。
文摘This paper studies the problem of time-varying formation control with finite-time prescribed performance for nonstrict feedback second-order multi-agent systems with unmeasured states and unknown nonlinearities.To eliminate nonlinearities,neural networks are applied to approximate the inherent dynamics of the system.In addition,due to the limitations of the actual working conditions,each follower agent can only obtain the locally measurable partial state information of the leader agent.To address this problem,a neural network state observer based on the leader state information is designed.Then,a finite-time prescribed performance adaptive output feedback control strategy is proposed by restricting the sliding mode surface to a prescribed region,which ensures that the closed-loop system has practical finite-time stability and that formation errors of the multi-agent systems converge to the prescribed performance bound in finite time.Finally,a numerical simulation is provided to demonstrate the practicality and effectiveness of the developed algorithm.
基金supported by the Key Research and Development Program of Jiangsu Provincial Department of Science and Technology(BE2020081).
文摘Wind-photovoltaic(PV)-hydrogen-storage multi-agent energy systems are expected to play an important role in promoting renewable power utilization and decarbonization.In this study,a coordinated operation method was proposed for a wind-PVhydrogen-storage multi-agent energy system.First,a coordinated operation model was formulated for each agent considering peer-to-peer power trading.Second,a coordinated operation interactive framework for a multi-agent energy system was proposed based on the theory of the alternating direction method of multipliers.Third,a distributed interactive algorithm was proposed to protect the privacy of each agent and solve coordinated operation strategies.Finally,the effectiveness of the proposed coordinated operation method was tested on multi-agent energy systems with different structures,and the operational revenues of the wind power,PV,hydrogen,and energy storage agents of the proposed coordinated operation model were improved by approximately 59.19%,233.28%,16.75%,and 145.56%,respectively,compared with the independent operation model.
基金The financial support fromthe Major Science and Technology Programs inHenan Province(Grant No.241100210100)National Natural Science Foundation of China(Grant No.62102372)+3 种基金Henan Provincial Department of Science and Technology Research Project(Grant No.242102211068)Henan Provincial Department of Science and Technology Research Project(Grant No.232102210078)the Stabilization Support Program of The Shenzhen Science and Technology Innovation Commission(Grant No.20231130110921001)the Key Scientific Research Project of Higher Education Institutions of Henan Province(Grant No.24A520042)is acknowledged.
文摘Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%.
基金supported by the National Natural Science Foundation of China(62103203)the General Terminal IC Interdisciplinary Science Center of Nankai University.
文摘Battery energy storage systems(BESSs)are widely used in smart grids.However,power consumed by inner impedance and the capacity degradation of each battery unit become particularly severe,which has resulted in an increase in operating costs.The general economic dispatch(ED)algorithm based on marginal cost(MC)consensus is usually a proportional(P)controller,which encounters the defects of slow convergence speed and low control accuracy.In order to solve the distributed ED problem of the isolated BESS network with excellent dynamic and steady-state performance,we attempt to design a proportional integral(PI)controller with a reset mechanism(PI+R)to asymptotically promote MC consensus and total power mismatch towards 0 in this paper.To be frank,the integral term in the PI controller is reset to 0 at an appropriate time when the proportional term undergoes a zero crossing,which accelerates convergence,improves control accuracy,and avoids overshoot.The eigenvalues of the system under a PI+R controller is well analyzed,ensuring the regularity of the system and enabling the reset mechanism.To ensure supply and demand balance within the isolated BESSs,a centralized reset mechanism is introduced,so that the controller is distributed in a flow set and centralized in a jump set.To cope with Zeno behavior and input delay,a dwell time that the system resides in a flow set is given.Based on this,the system with input delays can be reduced to a time-delay free system.Considering the capacity limitation of the battery,a modified MC scheme with PI+R controller is designed.The correctness of the designed scheme is verified through relevant simulations.
基金supported by State Grid Shanxi Electric Power Company Science and Technology Project“Research on key technologies of carbon tracking and carbon evaluation for new power system”(Grant:520530230005)。
文摘With the introduction of the“dual carbon”goal and the continuous promotion of low-carbon development,the integrated energy system(IES)has gradually become an effective way to save energy and reduce emissions.This study proposes a low-carbon economic optimization scheduling model for an IES that considers carbon trading costs.With the goal of minimizing the total operating cost of the IES and considering the transferable and curtailable characteristics of the electric and thermal flexible loads,an optimal scheduling model of the IES that considers the cost of carbon trading and flexible loads on the user side was established.The role of flexible loads in improving the economy of an energy system was investigated using examples,and the rationality and effectiveness of the study were verified through a comparative analysis of different scenarios.The results showed that the total cost of the system in different scenarios was reduced by 18.04%,9.1%,3.35%,and 7.03%,respectively,whereas the total carbon emissions of the system were reduced by 65.28%,20.63%,3.85%,and 18.03%,respectively,when the carbon trading cost and demand-side flexible electric and thermal load responses were considered simultaneously.Flexible electrical and thermal loads did not have the same impact on the system performance.In the analyzed case,the total cost and carbon emissions of the system when only the flexible electrical load response was considered were lower than those when only the flexible thermal load response was taken into account.Photovoltaics have an excess of carbon trading credits and can profit from selling them,whereas other devices have an excess of carbon trading and need to buy carbon credits.