Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experi...Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experiences that hinder convergence,resulting in ineffective training performance for multi‐agent systems.To tackle this issue,a novel reinforcement learning scheme,Mutual Information Oriented Deep Skill Chaining(MioDSC),is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency.These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state.In addition,MioDSC can generate cooperative policies using the options framework,allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi‐agent learning.MioDSC was evaluated in the multi‐agent particle environment and the StarCraft multi‐agent challenge at varying difficulty levels.The experimental results demonstrate that MioDSC outperforms state‐of‐the‐art methods and is robust across various multi‐agent system tasks with high stability.展开更多
The increasing trend toward dematerialization and digitalization has prompted a surge in the adoption of IT service providers, offering cost-effective alternatives to traditional local services. Consequently, cloud se...The increasing trend toward dematerialization and digitalization has prompted a surge in the adoption of IT service providers, offering cost-effective alternatives to traditional local services. Consequently, cloud services have become prevalent across various industries. While these services offer undeniable benefits, they face significant threats, particularly concerning the sensitivity of the data they handle. Many existing mathematical models struggle to accurately depict the complex scenarios of cloud systems. In response to this challenge, this paper proposes a behavioral model for ransomware propagation within such environments. In this model, each component of the environment is defined as an agent responsible for monitoring the propagation of malware. Given the distinct characteristics and criticality of these agents, the impact of malware can vary significantly. Scenario attacks are constructed based on real-world vulnerabilities documented in the Common Vulnerabilities and Exposures (CVEs) through the National Vulnerability Database. Defender actions are guided by an Intrusion Detection System (IDS) guideline. This research aims to provide a comprehensive framework for understanding and addressing ransomware threats in cloud systems. By leveraging an agent- based approach and real-world vulnerability data, our model offers valuable insights into detection and mitigation strategies for safeguarding sensitive cloud-based assets.展开更多
PDM (product data management) is one kind of techniques based on software and database, which integrates information and process related to products. But it is not enough to perform the complication of PDM in enterpri...PDM (product data management) is one kind of techniques based on software and database, which integrates information and process related to products. But it is not enough to perform the complication of PDM in enterprises. Then the mechanism to harmonize all kinds of information and process is needed. The paper introduces a novel approach to implement the intelligent monitor of PDM based on MAS (multi agent system). It carries out the management of information and process by MC (monitor center). The paper first puts forward the architecture of the whole system, then defines the structure of MC and its interoperation mode.展开更多
In this paper, we consider a consensus tracking problem of a class of networked multi-agent systems(MASs)in non-affine pure-feedback form under a directed topology. A distributed adaptive tracking consensus control sc...In this paper, we consider a consensus tracking problem of a class of networked multi-agent systems(MASs)in non-affine pure-feedback form under a directed topology. A distributed adaptive tracking consensus control scheme is constructed recursively by the backstepping method, graph theory,neural networks(NNs) and the dynamic surface control(DSC)approach. The key advantage of the proposed control strategy is that, by the DSC technique, it avoids "explosion of complexity"problem along with the increase of the degree of individual agents and thus the computational burden of the scheme can be drastically reduced. Moreover, there is no requirement for prior knowledge about system parameters of individual agents and uncertain dynamics by employing NNs approximation technology.We then further show that, in theory, the designed control policy guarantees the consensus errors to be cooperatively semi-globally uniformly ultimately bounded(CSUUB). Finally, two examples are presented to validate the effectiveness of the proposed control strategy.展开更多
Formation control and obstacle avoidance for multi-agent systems have attracted more and more attention. In this paper, the problems of formation control and obstacle avoidance are investigated by means of a consensus...Formation control and obstacle avoidance for multi-agent systems have attracted more and more attention. In this paper, the problems of formation control and obstacle avoidance are investigated by means of a consensus algorithm. A novel distributed control model is proposed for the multi-agent system to form the anticipated formation as well as achieve obstacle avoidance. Based on the consensus algorithm, a distributed control function consisting of three terms (formation control term, velocity matching term, and obstacle avoidance term) is presented. By establishing a novel formation control matrix, a formation control term is constructed such that the agents can converge to consensus and reach the anticipated formation. A new obstacle avoidance function is developed by using the modified potential field approach to make sure that obstacle avoidance can be achieved whether the obstacle is in a dynamic state or a stationary state. A velocity matching term is also put forward to guarantee that the velocities of all agents converge to the same value. Furthermore, stability of the control model is proven. Simulation results are provided to demonstrate the effectiveness of the proposed control.展开更多
This paper investigates the cluster consensus problem for second-order multi-agent systems by applying the pinning control method to a small collection of the agents. Consensus is attained independently for different ...This paper investigates the cluster consensus problem for second-order multi-agent systems by applying the pinning control method to a small collection of the agents. Consensus is attained independently for different agent clusters according to the community structure generated by the group partition of the underlying graph and sufficient conditions for both cluster and general consensus are obtained by using results from algebraic graph theory and the LaSalle Invariance Principle. Finally, some simple simulations are presented to illustrate the technique.展开更多
This paper studies the consensus problems for a group of agents with switching topology and time-varying communication delays, where the dynamics of agents is modeled as a high-order integrator. A linear distributed c...This paper studies the consensus problems for a group of agents with switching topology and time-varying communication delays, where the dynamics of agents is modeled as a high-order integrator. A linear distributed consensus protocol is proposed, which only depends on the agent's own information and its neighbors' partial information. By introducing a decomposition of the state vector and performing a state space transformation, the closed-loop dynamics of the multi-agent system is converted into two decoupled subsystems. Based on the decoupled subsystems, some sufficient conditions for the convergence to consensus are established, which provide the upper bounds on the admissible communication delays. Also, the explicit expression of the consensus state is derived. Moreover, the results on the consensus seeking of the group of high-order agents have been extended to a network of agents with dynamics modeled as a completely controllable linear time-invariant system. It is proved that the convergence to consensus of this network is equivalent to that of the group of high-order agents. Finally, some numerical examples are given to demonstrate the effectiveness of the main results.展开更多
基金National Natural Science Foundation of China,Grant/Award Number:61872171The Belt and Road Special Foundation of the State Key Laboratory of Hydrology‐Water Resources and Hydraulic Engineering,Grant/Award Number:2021490811。
文摘Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experiences that hinder convergence,resulting in ineffective training performance for multi‐agent systems.To tackle this issue,a novel reinforcement learning scheme,Mutual Information Oriented Deep Skill Chaining(MioDSC),is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency.These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state.In addition,MioDSC can generate cooperative policies using the options framework,allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi‐agent learning.MioDSC was evaluated in the multi‐agent particle environment and the StarCraft multi‐agent challenge at varying difficulty levels.The experimental results demonstrate that MioDSC outperforms state‐of‐the‐art methods and is robust across various multi‐agent system tasks with high stability.
文摘The increasing trend toward dematerialization and digitalization has prompted a surge in the adoption of IT service providers, offering cost-effective alternatives to traditional local services. Consequently, cloud services have become prevalent across various industries. While these services offer undeniable benefits, they face significant threats, particularly concerning the sensitivity of the data they handle. Many existing mathematical models struggle to accurately depict the complex scenarios of cloud systems. In response to this challenge, this paper proposes a behavioral model for ransomware propagation within such environments. In this model, each component of the environment is defined as an agent responsible for monitoring the propagation of malware. Given the distinct characteristics and criticality of these agents, the impact of malware can vary significantly. Scenario attacks are constructed based on real-world vulnerabilities documented in the Common Vulnerabilities and Exposures (CVEs) through the National Vulnerability Database. Defender actions are guided by an Intrusion Detection System (IDS) guideline. This research aims to provide a comprehensive framework for understanding and addressing ransomware threats in cloud systems. By leveraging an agent- based approach and real-world vulnerability data, our model offers valuable insights into detection and mitigation strategies for safeguarding sensitive cloud-based assets.
文摘PDM (product data management) is one kind of techniques based on software and database, which integrates information and process related to products. But it is not enough to perform the complication of PDM in enterprises. Then the mechanism to harmonize all kinds of information and process is needed. The paper introduces a novel approach to implement the intelligent monitor of PDM based on MAS (multi agent system). It carries out the management of information and process by MC (monitor center). The paper first puts forward the architecture of the whole system, then defines the structure of MC and its interoperation mode.
基金Supported by National Basic Research Program of China (973 Program) (2010CB731800), Key Project of Natural Science Fouudation of China (60934003), National Natural Science Foundation of China (61074065, 60974018), Natural Science Foundation of Hebei Province(F2012203119), and the Science Foundation of Yanshan University for the Excellent Ph. D. Students (201204) The authors thank Chen Cai-Lian of the Shanghai Jiao Tong University for her comments on English polishing and problem formulation.
基金supported in part by the National Natural Science Foundation of Chin(61503194,61533010,61374055)the Ph.D.Programs Foundation of Ministry of Education of China(20110142110036)+6 种基金the Natural Science Foundation o Jiangsu Province(BK20131381,BK20140877)China Postdoctoral Scienc Foundation(2015M571788)Jiangsu Planned Projects for Postdoctoral Re search Funds(1402066B)the Foundation of the Key Laboratory of Marin Dynamic Simulation and Control for the Ministry of Transport(DMU)(DMU MSCKLT2016005)Jiangsu Government Scholarship for Overseas Studie(2017-037)the Key University Natural Science Research Project of Jiangsu Province(17KJA120003)the Scientific Foundation of Nanjing University of Posts and Telecommunications(NUPTSF)(NY214076)
文摘In this paper, we consider a consensus tracking problem of a class of networked multi-agent systems(MASs)in non-affine pure-feedback form under a directed topology. A distributed adaptive tracking consensus control scheme is constructed recursively by the backstepping method, graph theory,neural networks(NNs) and the dynamic surface control(DSC)approach. The key advantage of the proposed control strategy is that, by the DSC technique, it avoids "explosion of complexity"problem along with the increase of the degree of individual agents and thus the computational burden of the scheme can be drastically reduced. Moreover, there is no requirement for prior knowledge about system parameters of individual agents and uncertain dynamics by employing NNs approximation technology.We then further show that, in theory, the designed control policy guarantees the consensus errors to be cooperatively semi-globally uniformly ultimately bounded(CSUUB). Finally, two examples are presented to validate the effectiveness of the proposed control strategy.
基金Supported by National Natural Science Foundation of China (61079001, 61273006), National High Technology Research and Development Program of China (863 Program) (2011AA110301), and Specialized Research Fund for the Doctoral Program of Higher Education of China (20111103110017)
基金supported by the National High Technology Research and Development Program of China(Grant No.2011AA040103)the Research Foundationof Shanghai Institute of Technology,China(Grant No.B504)
文摘Formation control and obstacle avoidance for multi-agent systems have attracted more and more attention. In this paper, the problems of formation control and obstacle avoidance are investigated by means of a consensus algorithm. A novel distributed control model is proposed for the multi-agent system to form the anticipated formation as well as achieve obstacle avoidance. Based on the consensus algorithm, a distributed control function consisting of three terms (formation control term, velocity matching term, and obstacle avoidance term) is presented. By establishing a novel formation control matrix, a formation control term is constructed such that the agents can converge to consensus and reach the anticipated formation. A new obstacle avoidance function is developed by using the modified potential field approach to make sure that obstacle avoidance can be achieved whether the obstacle is in a dynamic state or a stationary state. A velocity matching term is also put forward to guarantee that the velocities of all agents converge to the same value. Furthermore, stability of the control model is proven. Simulation results are provided to demonstrate the effectiveness of the proposed control.
基金Project supported by the National Natural Science Foundation of China (Grant No. 70571059)
文摘This paper investigates the cluster consensus problem for second-order multi-agent systems by applying the pinning control method to a small collection of the agents. Consensus is attained independently for different agent clusters according to the community structure generated by the group partition of the underlying graph and sufficient conditions for both cluster and general consensus are obtained by using results from algebraic graph theory and the LaSalle Invariance Principle. Finally, some simple simulations are presented to illustrate the technique.
基金supported by the National Natural Science Foundation of China(No.60674050,60736022,10972002,60774089,60704039)
文摘This paper studies the consensus problems for a group of agents with switching topology and time-varying communication delays, where the dynamics of agents is modeled as a high-order integrator. A linear distributed consensus protocol is proposed, which only depends on the agent's own information and its neighbors' partial information. By introducing a decomposition of the state vector and performing a state space transformation, the closed-loop dynamics of the multi-agent system is converted into two decoupled subsystems. Based on the decoupled subsystems, some sufficient conditions for the convergence to consensus are established, which provide the upper bounds on the admissible communication delays. Also, the explicit expression of the consensus state is derived. Moreover, the results on the consensus seeking of the group of high-order agents have been extended to a network of agents with dynamics modeled as a completely controllable linear time-invariant system. It is proved that the convergence to consensus of this network is equivalent to that of the group of high-order agents. Finally, some numerical examples are given to demonstrate the effectiveness of the main results.