Ballistic missile defense system (BMDS) is important for its special role in ensuring national security and maintaining strategic balance. Research on modeling and simulation of the BMDS beforehand is essential as dev...Ballistic missile defense system (BMDS) is important for its special role in ensuring national security and maintaining strategic balance. Research on modeling and simulation of the BMDS beforehand is essential as developing a real one requires lots of manpower and resources. BMDS is a typical complex system for its nonlinear, adaptive and uncertainty characteristics. The agent-based modeling method is well suited for the complex system whose overall behaviors are determined by interactions among individual elements. A multi-agent decision support system (DSS), which includes missile agent, radar agent and command center agent, is established based on the studies of structure and function of BMDS. Considering the constraints brought by radar, intercept missile, offensive missile and commander, the objective function of DSS is established. In order to dynamically generate the optimal interception plan, the variable neighborhood negative selection particle swarm optimization (VNNSPSO) algorithm is proposed to support the decision making of DSS. The proposed algorithm is compared with the standard PSO, constriction factor PSO (CFPSO), inertia weight linear decrease PSO (LDPSO), variable neighborhood PSO (VNPSO) algorithm from the aspects of convergence rate, iteration number, average fitness value and standard deviation. The simulation results verify the efficiency of the proposed algorithm. The multi-agent DSS is developed through the Repast simulation platform and the constructed DSS can generate intercept plans automatically and support three-dimensional dynamic display of missile defense process.展开更多
Constructing a cross-border power energy system with multiagent power energy as an alliance is important for studying cross-border power-trading markets.This study considers multiple neighboring countries in the form ...Constructing a cross-border power energy system with multiagent power energy as an alliance is important for studying cross-border power-trading markets.This study considers multiple neighboring countries in the form of alliances,introduces neighboring countries’exchange rates into the cross-border multi-agent power-trading market and proposes a method to study each agent’s dynamic decision-making behavior based on evolutionary game theory.To this end,this study uses three national agents as examples,constructs a tripartite evolutionary game model,and analyzes the evolution process of the decision-making behavior of each agent member state under the initial willingness value,cost of payment,and additional revenue of the alliance.This research helps realize cross-border energy operations so that the transaction agent can achieve greater trade profits and provides a theoretical basis for cooperation and stability between multiple agents.展开更多
The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-ma...The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.展开更多
Offline reinforcement learning leverages previously collected offline datasets to learn optimal policies with no necessity to access the real environment.Such a paradigm is also desirable for multi-agent reinforcement...Offline reinforcement learning leverages previously collected offline datasets to learn optimal policies with no necessity to access the real environment.Such a paradigm is also desirable for multi-agent reinforcement learning(MARL)tasks,given the combinatorially increased interactions among agents and with the environment.However,in MARL,the paradigm of offline pre-training with online fine-tuning has not been studied,nor even datasets or benchmarks for offline MARL research are available.In this paper,we facilitate the research by providing large-scale datasets and using them to examine the usage of the decision transformer in the context of MARL.We investigate the generalization of MARL offline pre-training in the following three aspects:1)between single agents and multiple agents,2)from offline pretraining to online fine tuning,and 3)to that of multiple downstream tasks with few-shot and zero-shot capabilities.We start by introducing the first offline MARL dataset with diverse quality levels based on the StarCraftII environment,and then propose the novel architecture of multi-agent decision transformer(MADT)for effective offline learning.MADT leverages the transformer′s modelling ability for sequence modelling and integrates it seamlessly with both offline and online MARL tasks.A significant benefit of MADT is that it learns generalizable policies that can transfer between different types of agents under different task scenarios.On the StarCraft II offline dataset,MADT outperforms the state-of-the-art offline reinforcement learning(RL)baselines,including BCQ and CQL.When applied to online tasks,the pre-trained MADT significantly improves sample efficiency and enjoys strong performance in both few-short and zero-shot cases.To the best of our knowledge,this is the first work that studies and demonstrates the effectiveness of offline pre-trained models in terms of sample efficiency and generalizability enhancements for MARL.展开更多
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M...In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.展开更多
Different from the organization structure of complex projects in Western countries, the Liang Zong hierarchical organization structure of complex projects in China has two different chains, the chief-engineer chain an...Different from the organization structure of complex projects in Western countries, the Liang Zong hierarchical organization structure of complex projects in China has two different chains, the chief-engineer chain and the general-director chain,to handle the trade-off between technical and management decisions. However, previous works on organization search have mainly focused on the single-chain hierarchical organization in which all decisions are regarded as homogeneous. The heterogeneity and the interdependency between technical decisions and management decisions have been neglected. A two-chain hierarchical organization structure mapped from a real complex project is constructed. Then, a discrete decision model for a Liang Zong two-chain hierarchical organization in an NK model framework is proposed. This model proves that this kind of organization structure can reduce the search space by a large amount and that the search process should reach a final stable state more quickly. For a more complicated decision mechanism, a multi-agent simulation based on the above NK model is used to explore the effect of the two-chain organization structure on the speed, stability, and performance of the search process. The results provide three insights into how, compared with the single-chain hierarchical organization, the two-chain organization can improve the search process: it can reduce the number of iterations efficiently; the search is more stable because the search space is a smoother hill-like fitness landscape; in general, the search performance can be improved.However, when the organization structure is very complicated, the performance of a two-chain organization is inferior to that of a single-chain organization. These findings about the efficiency of the unique Chinese-style organization structure can be used to guide organization design for complex projects.展开更多
The decision.making process of the public service facility configuration in multi.agent community is usually simplistic and static. In order to reflect dynamic changes and interactions of all behavior subjects indudin...The decision.making process of the public service facility configuration in multi.agent community is usually simplistic and static. In order to reflect dynamic changes and interactions of all behavior subjects induding of residents, real estate developers and the government, a decision-making model of public service facility configuration according to the multi-agent theory was made to improve the efficiency of the public service facility configuration in community and the living quality of residents. Taking a community to the cast of Jinhui Port in Fengxian District in Shanghai for example, the model analyzed the decision-makers' adaptive behaviors and simulated the decision.making criteria. The results indicate that the decision-making model and criteria can be well of satisfying the purpose of improving validity and rationality of public service facility configuration in large community.展开更多
In this,the decision problem in a large-scale system consisting of sevral subeys subeystm are considerde and the metheds for cofliect between the subsystems are explored.Based on the multiperson multiobjective conflic...In this,the decision problem in a large-scale system consisting of sevral subeys subeystm are considerde and the metheds for cofliect between the subsystems are explored.Based on the multiperson multiobjective conflict decision(MMCD) model proposed in Ref.[6],the concept,of bargaining solution for conflicts in large-scale systems is presented,and an approach to achieving th bargaining soltion is proposed.展开更多
The genetic microarrays give to researchers a huge amount of data of many diseases represented by intensities of gene expression. In genomic medicine gene expression analysis is guided to find strategies for preventio...The genetic microarrays give to researchers a huge amount of data of many diseases represented by intensities of gene expression. In genomic medicine gene expression analysis is guided to find strategies for prevention and treatment of diseases with high rate of mortality like the different cancers. So, genomic medicine requires the use of complex information technology. The purpose of our paper is to present a multi-agent system developed in order to improve gene expression analysis with the automation of tasks about identification of genes involved in a cancer, and classification of tumors according to molecular biology. Agents that integrate the system, carry out reading files of intensity data of genes from microarrays, pre-processing of this information, and with machine learning methods make groups of genes involved in the process of a disease as well as the classification of samples that could propose new subtypes of tumors difficult to identify based on their morphology. Our results we prove that the multi-agent system requires a minimal intervention of user, and the agents generate knowledge that reduce the time and complexity of the work of prevention and diagnosis, and thus allow a more effective treatment of tumors.展开更多
Collision avoidance decision-making models of multiple agents in virtual driving environment are studied. Based on the behavioral characteristics and hierarchical structure of the collision avoidance decision-making i...Collision avoidance decision-making models of multiple agents in virtual driving environment are studied. Based on the behavioral characteristics and hierarchical structure of the collision avoidance decision-making in real life driving, delphi approach and mathematical statistics method are introduced to construct pair-wise comparison judgment matrix of collision avoidance decision choices to each collision situation. Analytic hierarchy process (AHP) is adopted to establish the agents' collision avoidance decision-making model. To simulate drivers' characteristics, driver factors are added to categorize driving modes into impatient mode, normal mode, and the cautious mode. The results show that this model can simulate human's thinking process, and the agents in the virtual environment can deal with collision situations and make decisions to avoid collisions without intervention. The model can also reflect diversity and uncertainly of real life driving behaviors, and solves the multi-objective, multi-choice ranking priority problem in multi-vehicle collision scenarios. This collision avoidance model of multi-agents model is feasible and effective, and can provide richer and closer-to-life virtual scene for driving simulator, reflecting real-life traffic environment more truly, this model can also promote the practicality of driving simulator.展开更多
Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for st...Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for strength enhancement becoming a trend.The stress-assisted corrosion behavior of a novel designed high-strength 3Ni steel was investigated in the current study using the corrosion big data method.The information on the corrosion process was recorded using the galvanic corrosion current monitoring method.The gradi-ent boosting decision tree(GBDT)machine learning method was used to mine the corrosion mechanism,and the importance of the struc-ture factor was investigated.Field exposure tests were conducted to verify the calculated results using the GBDT method.Results indic-ated that the GBDT method can be effectively used to study the influence of structural factors on the corrosion process of 3Ni steel.Dif-ferent mechanisms for the addition of Mn and Cu to the stress-assisted corrosion of 3Ni steel suggested that Mn and Cu have no obvious effect on the corrosion rate of non-stressed 3Ni steel during the early stage of corrosion.When the corrosion reached a stable state,the in-crease in Mn element content increased the corrosion rate of 3Ni steel,while Cu reduced this rate.In the presence of stress,the increase in Mn element content and Cu addition can inhibit the corrosion process.The corrosion law of outdoor-exposed 3Ni steel is consistent with the law based on corrosion big data technology,verifying the reliability of the big data evaluation method and data prediction model selection.展开更多
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers...Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses.展开更多
This paper studies the problem of time-varying formation control with finite-time prescribed performance for nonstrict feedback second-order multi-agent systems with unmeasured states and unknown nonlinearities.To eli...This paper studies the problem of time-varying formation control with finite-time prescribed performance for nonstrict feedback second-order multi-agent systems with unmeasured states and unknown nonlinearities.To eliminate nonlinearities,neural networks are applied to approximate the inherent dynamics of the system.In addition,due to the limitations of the actual working conditions,each follower agent can only obtain the locally measurable partial state information of the leader agent.To address this problem,a neural network state observer based on the leader state information is designed.Then,a finite-time prescribed performance adaptive output feedback control strategy is proposed by restricting the sliding mode surface to a prescribed region,which ensures that the closed-loop system has practical finite-time stability and that formation errors of the multi-agent systems converge to the prescribed performance bound in finite time.Finally,a numerical simulation is provided to demonstrate the practicality and effectiveness of the developed algorithm.展开更多
To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-lea...To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference.展开更多
This article investigates the problem of robust adaptive leaderless consensus for heterogeneous uncertain nonminimumphase linear multi-agent systems over directed communication graphs. Each agent is assumed tobe of un...This article investigates the problem of robust adaptive leaderless consensus for heterogeneous uncertain nonminimumphase linear multi-agent systems over directed communication graphs. Each agent is assumed tobe of unknown nominal dynamics and also subject to external disturbances and/or unmodeled dynamics. Anovel distributed robust adaptive control strategy is proposed. It is shown that the robust adaptive leaderlessconsensus problem is solved with the proposed control strategy under some sufficient conditions. Two examplesare provided to demonstrate the efficacy of the proposed control strategy.展开更多
The presence of numerous uncertainties in hybrid decision information systems(HDISs)renders attribute reduction a formidable task.Currently available attribute reduction algorithms,including those based on Pawlak attr...The presence of numerous uncertainties in hybrid decision information systems(HDISs)renders attribute reduction a formidable task.Currently available attribute reduction algorithms,including those based on Pawlak attribute importance,Skowron discernibility matrix,and information entropy,struggle to effectively manages multiple uncertainties simultaneously in HDISs like the precise measurement of disparities between nominal attribute values,and attributes with fuzzy boundaries and abnormal values.In order to address the aforementioned issues,this paper delves into the study of attribute reduction withinHDISs.First of all,a novel metric based on the decision attribute is introduced to solve the problem of accurately measuring the differences between nominal attribute values.The newly introduced distance metric has been christened the supervised distance that can effectively quantify the differences between the nominal attribute values.Then,based on the newly developed metric,a novel fuzzy relationship is defined from the perspective of“feedback on parity of attribute values to attribute sets”.This new fuzzy relationship serves as a valuable tool in addressing the challenges posed by abnormal attribute values.Furthermore,leveraging the newly introduced fuzzy relationship,the fuzzy conditional information entropy is defined as a solution to the challenges posed by fuzzy attributes.It effectively quantifies the uncertainty associated with fuzzy attribute values,thereby providing a robust framework for handling fuzzy information in hybrid information systems.Finally,an algorithm for attribute reduction utilizing the fuzzy conditional information entropy is presented.The experimental results on 12 datasets show that the average reduction rate of our algorithm reaches 84.04%,and the classification accuracy is improved by 3.91%compared to the original dataset,and by an average of 11.25%compared to the other 9 state-of-the-art reduction algorithms.The comprehensive analysis of these research results clearly indicates that our algorithm is highly effective in managing the intricate uncertainties inherent in hybrid data.展开更多
Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(S...Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%.展开更多
Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that ...Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms.展开更多
文摘Ballistic missile defense system (BMDS) is important for its special role in ensuring national security and maintaining strategic balance. Research on modeling and simulation of the BMDS beforehand is essential as developing a real one requires lots of manpower and resources. BMDS is a typical complex system for its nonlinear, adaptive and uncertainty characteristics. The agent-based modeling method is well suited for the complex system whose overall behaviors are determined by interactions among individual elements. A multi-agent decision support system (DSS), which includes missile agent, radar agent and command center agent, is established based on the studies of structure and function of BMDS. Considering the constraints brought by radar, intercept missile, offensive missile and commander, the objective function of DSS is established. In order to dynamically generate the optimal interception plan, the variable neighborhood negative selection particle swarm optimization (VNNSPSO) algorithm is proposed to support the decision making of DSS. The proposed algorithm is compared with the standard PSO, constriction factor PSO (CFPSO), inertia weight linear decrease PSO (LDPSO), variable neighborhood PSO (VNPSO) algorithm from the aspects of convergence rate, iteration number, average fitness value and standard deviation. The simulation results verify the efficiency of the proposed algorithm. The multi-agent DSS is developed through the Repast simulation platform and the constructed DSS can generate intercept plans automatically and support three-dimensional dynamic display of missile defense process.
基金National Key R&D Program of China(Grant No.2022YFB2703500)National Natural Science Foundation of China(Grant No.52277104)+2 种基金National Key R&D Program of Yunnan Province(202303AC100003)Applied Basic Research Foundation of Yunnan Province (202301AT070455, 202101AT070080)Revitalizing Talent Support Program of Yunnan Province (KKRD202204024).
文摘Constructing a cross-border power energy system with multiagent power energy as an alliance is important for studying cross-border power-trading markets.This study considers multiple neighboring countries in the form of alliances,introduces neighboring countries’exchange rates into the cross-border multi-agent power-trading market and proposes a method to study each agent’s dynamic decision-making behavior based on evolutionary game theory.To this end,this study uses three national agents as examples,constructs a tripartite evolutionary game model,and analyzes the evolution process of the decision-making behavior of each agent member state under the initial willingness value,cost of payment,and additional revenue of the alliance.This research helps realize cross-border energy operations so that the transaction agent can achieve greater trade profits and provides a theoretical basis for cooperation and stability between multiple agents.
文摘The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.
基金Linghui Meng was supported in part by the Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDA27030300)Haifeng Zhang was supported in part by the National Natural Science Foundation of China(No.62206289).
文摘Offline reinforcement learning leverages previously collected offline datasets to learn optimal policies with no necessity to access the real environment.Such a paradigm is also desirable for multi-agent reinforcement learning(MARL)tasks,given the combinatorially increased interactions among agents and with the environment.However,in MARL,the paradigm of offline pre-training with online fine-tuning has not been studied,nor even datasets or benchmarks for offline MARL research are available.In this paper,we facilitate the research by providing large-scale datasets and using them to examine the usage of the decision transformer in the context of MARL.We investigate the generalization of MARL offline pre-training in the following three aspects:1)between single agents and multiple agents,2)from offline pretraining to online fine tuning,and 3)to that of multiple downstream tasks with few-shot and zero-shot capabilities.We start by introducing the first offline MARL dataset with diverse quality levels based on the StarCraftII environment,and then propose the novel architecture of multi-agent decision transformer(MADT)for effective offline learning.MADT leverages the transformer′s modelling ability for sequence modelling and integrates it seamlessly with both offline and online MARL tasks.A significant benefit of MADT is that it learns generalizable policies that can transfer between different types of agents under different task scenarios.On the StarCraft II offline dataset,MADT outperforms the state-of-the-art offline reinforcement learning(RL)baselines,including BCQ and CQL.When applied to online tasks,the pre-trained MADT significantly improves sample efficiency and enjoys strong performance in both few-short and zero-shot cases.To the best of our knowledge,this is the first work that studies and demonstrates the effectiveness of offline pre-trained models in terms of sample efficiency and generalizability enhancements for MARL.
基金supported by the National Key R&D Program of China(2017YFB1400105).
文摘In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.
基金supported by the National Natural Science Foundation of China(7157105771390522)the Key Lab for Public Engineering Audit of Jiangsu Province,Nanjing Audit University(GGSS2016-08)
文摘Different from the organization structure of complex projects in Western countries, the Liang Zong hierarchical organization structure of complex projects in China has two different chains, the chief-engineer chain and the general-director chain,to handle the trade-off between technical and management decisions. However, previous works on organization search have mainly focused on the single-chain hierarchical organization in which all decisions are regarded as homogeneous. The heterogeneity and the interdependency between technical decisions and management decisions have been neglected. A two-chain hierarchical organization structure mapped from a real complex project is constructed. Then, a discrete decision model for a Liang Zong two-chain hierarchical organization in an NK model framework is proposed. This model proves that this kind of organization structure can reduce the search space by a large amount and that the search process should reach a final stable state more quickly. For a more complicated decision mechanism, a multi-agent simulation based on the above NK model is used to explore the effect of the two-chain organization structure on the speed, stability, and performance of the search process. The results provide three insights into how, compared with the single-chain hierarchical organization, the two-chain organization can improve the search process: it can reduce the number of iterations efficiently; the search is more stable because the search space is a smoother hill-like fitness landscape; in general, the search performance can be improved.However, when the organization structure is very complicated, the performance of a two-chain organization is inferior to that of a single-chain organization. These findings about the efficiency of the unique Chinese-style organization structure can be used to guide organization design for complex projects.
基金National Natural Science Foundation of China(No.71403173)
文摘The decision.making process of the public service facility configuration in multi.agent community is usually simplistic and static. In order to reflect dynamic changes and interactions of all behavior subjects induding of residents, real estate developers and the government, a decision-making model of public service facility configuration according to the multi-agent theory was made to improve the efficiency of the public service facility configuration in community and the living quality of residents. Taking a community to the cast of Jinhui Port in Fengxian District in Shanghai for example, the model analyzed the decision-makers' adaptive behaviors and simulated the decision.making criteria. The results indicate that the decision-making model and criteria can be well of satisfying the purpose of improving validity and rationality of public service facility configuration in large community.
文摘In this,the decision problem in a large-scale system consisting of sevral subeys subeystm are considerde and the metheds for cofliect between the subsystems are explored.Based on the multiperson multiobjective conflict decision(MMCD) model proposed in Ref.[6],the concept,of bargaining solution for conflicts in large-scale systems is presented,and an approach to achieving th bargaining soltion is proposed.
文摘The genetic microarrays give to researchers a huge amount of data of many diseases represented by intensities of gene expression. In genomic medicine gene expression analysis is guided to find strategies for prevention and treatment of diseases with high rate of mortality like the different cancers. So, genomic medicine requires the use of complex information technology. The purpose of our paper is to present a multi-agent system developed in order to improve gene expression analysis with the automation of tasks about identification of genes involved in a cancer, and classification of tumors according to molecular biology. Agents that integrate the system, carry out reading files of intensity data of genes from microarrays, pre-processing of this information, and with machine learning methods make groups of genes involved in the process of a disease as well as the classification of samples that could propose new subtypes of tumors difficult to identify based on their morphology. Our results we prove that the multi-agent system requires a minimal intervention of user, and the agents generate knowledge that reduce the time and complexity of the work of prevention and diagnosis, and thus allow a more effective treatment of tumors.
基金supported by National Basic Research Program (973 Program,No.2004CB719402)National Natural Science Foundation of China (No.60736019)Natural Science Foundation of Zhejiang Province, China(No.Y105430).
文摘Collision avoidance decision-making models of multiple agents in virtual driving environment are studied. Based on the behavioral characteristics and hierarchical structure of the collision avoidance decision-making in real life driving, delphi approach and mathematical statistics method are introduced to construct pair-wise comparison judgment matrix of collision avoidance decision choices to each collision situation. Analytic hierarchy process (AHP) is adopted to establish the agents' collision avoidance decision-making model. To simulate drivers' characteristics, driver factors are added to categorize driving modes into impatient mode, normal mode, and the cautious mode. The results show that this model can simulate human's thinking process, and the agents in the virtual environment can deal with collision situations and make decisions to avoid collisions without intervention. The model can also reflect diversity and uncertainly of real life driving behaviors, and solves the multi-objective, multi-choice ranking priority problem in multi-vehicle collision scenarios. This collision avoidance model of multi-agents model is feasible and effective, and can provide richer and closer-to-life virtual scene for driving simulator, reflecting real-life traffic environment more truly, this model can also promote the practicality of driving simulator.
基金supported by the National Nat-ural Science Foundation of China(No.52203376)the National Key Research and Development Program of China(No.2023YFB3813200).
文摘Traditional 3Ni weathering steel cannot completely meet the requirements for offshore engineering development,resulting in the design of novel 3Ni steel with the addition of microalloy elements such as Mn or Nb for strength enhancement becoming a trend.The stress-assisted corrosion behavior of a novel designed high-strength 3Ni steel was investigated in the current study using the corrosion big data method.The information on the corrosion process was recorded using the galvanic corrosion current monitoring method.The gradi-ent boosting decision tree(GBDT)machine learning method was used to mine the corrosion mechanism,and the importance of the struc-ture factor was investigated.Field exposure tests were conducted to verify the calculated results using the GBDT method.Results indic-ated that the GBDT method can be effectively used to study the influence of structural factors on the corrosion process of 3Ni steel.Dif-ferent mechanisms for the addition of Mn and Cu to the stress-assisted corrosion of 3Ni steel suggested that Mn and Cu have no obvious effect on the corrosion rate of non-stressed 3Ni steel during the early stage of corrosion.When the corrosion reached a stable state,the in-crease in Mn element content increased the corrosion rate of 3Ni steel,while Cu reduced this rate.In the presence of stress,the increase in Mn element content and Cu addition can inhibit the corrosion process.The corrosion law of outdoor-exposed 3Ni steel is consistent with the law based on corrosion big data technology,verifying the reliability of the big data evaluation method and data prediction model selection.
基金supported in part by NSFC (62102099, U22A2054, 62101594)in part by the Pearl River Talent Recruitment Program (2021QN02S643)+9 种基金Guangzhou Basic Research Program (2023A04J1699)in part by the National Research Foundation, SingaporeInfocomm Media Development Authority under its Future Communications Research Development ProgrammeDSO National Laboratories under the AI Singapore Programme under AISG Award No AISG2-RP-2020-019Energy Research Test-Bed and Industry Partnership Funding Initiative, Energy Grid (EG) 2.0 programmeDesCartes and the Campus for Research Excellence and Technological Enterprise (CREATE) programmeMOE Tier 1 under Grant RG87/22in part by the Singapore University of Technology and Design (SUTD) (SRG-ISTD-2021- 165)in part by the SUTD-ZJU IDEA Grant SUTD-ZJU (VP) 202102in part by the Ministry of Education, Singapore, through its SUTD Kickstarter Initiative (SKI 20210204)。
文摘Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses.
基金the National Natural Science Foundation of China(62203356)Fundamental Research Funds for the Central Universities of China(31020210502002)。
文摘This paper studies the problem of time-varying formation control with finite-time prescribed performance for nonstrict feedback second-order multi-agent systems with unmeasured states and unknown nonlinearities.To eliminate nonlinearities,neural networks are applied to approximate the inherent dynamics of the system.In addition,due to the limitations of the actual working conditions,each follower agent can only obtain the locally measurable partial state information of the leader agent.To address this problem,a neural network state observer based on the leader state information is designed.Then,a finite-time prescribed performance adaptive output feedback control strategy is proposed by restricting the sliding mode surface to a prescribed region,which ensures that the closed-loop system has practical finite-time stability and that formation errors of the multi-agent systems converge to the prescribed performance bound in finite time.Finally,a numerical simulation is provided to demonstrate the practicality and effectiveness of the developed algorithm.
基金National Natural Science Foundation of China(61973037)National 173 Program Project(2019-JCJQ-ZD-324).
文摘To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference.
基金Research Grants Council of Hong Kong under Grant CityU-11205221.
文摘This article investigates the problem of robust adaptive leaderless consensus for heterogeneous uncertain nonminimumphase linear multi-agent systems over directed communication graphs. Each agent is assumed tobe of unknown nominal dynamics and also subject to external disturbances and/or unmodeled dynamics. Anovel distributed robust adaptive control strategy is proposed. It is shown that the robust adaptive leaderlessconsensus problem is solved with the proposed control strategy under some sufficient conditions. Two examplesare provided to demonstrate the efficacy of the proposed control strategy.
基金Anhui Province Natural Science Research Project of Colleges and Universities(2023AH040321)Excellent Scientific Research and Innovation Team of Anhui Colleges(2022AH010098).
文摘The presence of numerous uncertainties in hybrid decision information systems(HDISs)renders attribute reduction a formidable task.Currently available attribute reduction algorithms,including those based on Pawlak attribute importance,Skowron discernibility matrix,and information entropy,struggle to effectively manages multiple uncertainties simultaneously in HDISs like the precise measurement of disparities between nominal attribute values,and attributes with fuzzy boundaries and abnormal values.In order to address the aforementioned issues,this paper delves into the study of attribute reduction withinHDISs.First of all,a novel metric based on the decision attribute is introduced to solve the problem of accurately measuring the differences between nominal attribute values.The newly introduced distance metric has been christened the supervised distance that can effectively quantify the differences between the nominal attribute values.Then,based on the newly developed metric,a novel fuzzy relationship is defined from the perspective of“feedback on parity of attribute values to attribute sets”.This new fuzzy relationship serves as a valuable tool in addressing the challenges posed by abnormal attribute values.Furthermore,leveraging the newly introduced fuzzy relationship,the fuzzy conditional information entropy is defined as a solution to the challenges posed by fuzzy attributes.It effectively quantifies the uncertainty associated with fuzzy attribute values,thereby providing a robust framework for handling fuzzy information in hybrid information systems.Finally,an algorithm for attribute reduction utilizing the fuzzy conditional information entropy is presented.The experimental results on 12 datasets show that the average reduction rate of our algorithm reaches 84.04%,and the classification accuracy is improved by 3.91%compared to the original dataset,and by an average of 11.25%compared to the other 9 state-of-the-art reduction algorithms.The comprehensive analysis of these research results clearly indicates that our algorithm is highly effective in managing the intricate uncertainties inherent in hybrid data.
基金The financial support fromthe Major Science and Technology Programs inHenan Province(Grant No.241100210100)National Natural Science Foundation of China(Grant No.62102372)+3 种基金Henan Provincial Department of Science and Technology Research Project(Grant No.242102211068)Henan Provincial Department of Science and Technology Research Project(Grant No.232102210078)the Stabilization Support Program of The Shenzhen Science and Technology Innovation Commission(Grant No.20231130110921001)the Key Scientific Research Project of Higher Education Institutions of Henan Province(Grant No.24A520042)is acknowledged.
文摘Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%.
基金supported in part by the National Natural Science Foundation of China (62136008,62236002,61921004,62173251,62103104)the “Zhishan” Scholars Programs of Southeast Universitythe Fundamental Research Funds for the Central Universities (2242023K30034)。
文摘Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms.