In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading...In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.展开更多
Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinfor...Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinforce-ment learning(DRL)theory and an improved Multi-Agent Deep Deterministic Policy Gradient(MADDPG-D2)algorithm with a dual experience replay pool and a dual noise based on multi-agent architecture is proposed to improve the efficiency of DTA.The algorithm is based on the traditional Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,and considers the introduction of a double noise mechanism to increase the action exploration space in the early stage of the algorithm,and the introduction of a double experience pool to improve the data utilization rate;at the same time,in order to accelerate the training speed and efficiency of the agents,and to solve the cold-start problem of the training,the a priori knowledge technology is applied to the training of the algorithm.Finally,the MADDPG-D2 algorithm is compared and analyzed based on the digital battlefield of ground and air confrontation.The experimental results show that the agents trained by the MADDPG-D2 algorithm have higher win rates and average rewards,can utilize the resources more reasonably,and better solve the problem of the traditional single agent algorithms facing the difficulty of solving the problem in the high-dimensional decision space.The MADDPG-D2 algorithm based on multi-agent architecture proposed in this paper has certain superiority and rationality in DTA.展开更多
Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynami...Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynamic environments.To reduce the overhead cost,we propose a multi-user beam tracking algorithm using a distributed deep Q-learning method.With online learning of users’moving trajectories,the proposed algorithm learns to scan a beam subspace to maximize the average effective sum rate.Considering practical implementation,we model the continuous beam tracking problem as a non-Markov decision process and thus develop a simplified training scheme of deep Q-learning to reduce the training complexity.Furthermore,we propose a scalable state-action-reward design for scenarios with different users and antenna numbers.Simulation results verify the effectiveness of the designed method.展开更多
Model checking is an automated formal verification method to verify whether epistemic multi-agent systems adhere to property specifications.Although there is an extensive literature on qualitative properties such as s...Model checking is an automated formal verification method to verify whether epistemic multi-agent systems adhere to property specifications.Although there is an extensive literature on qualitative properties such as safety and liveness,there is still a lack of quantitative and uncertain property verifications for these systems.In uncertain environments,agents must make judicious decisions based on subjective epistemic.To verify epistemic and measurable properties in multi-agent systems,this paper extends fuzzy computation tree logic by introducing epistemic modalities and proposing a new Fuzzy Computation Tree Logic of Knowledge(FCTLK).We represent fuzzy multi-agent systems as distributed knowledge bases with fuzzy epistemic interpreted systems.In addition,we provide a transformation algorithm from fuzzy epistemic interpreted systems to fuzzy Kripke structures,as well as transformation rules from FCTLK formulas to Fuzzy Computation Tree Logic(FCTL)formulas.Accordingly,we transform the FCTLK model checking problem into the FCTL model checking.This enables the verification of FCTLK formulas by using the fuzzy model checking algorithm of FCTL without additional computational overheads.Finally,we present correctness proofs and complexity analyses of the proposed algorithms.Additionally,we further illustrate the practical application of our approach through an example of a train control system.展开更多
In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learni...In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by joint-action. In this article, a prediction-based reinforcement learning algorithm is presented for multi-agent cooperation tasks, which demands all agents to learn predicting the probabilities of actions that other agents may execute. A multi-robot cooperation experiment is run to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation policy much faster than the primitive reinforcement learning algorithm.展开更多
Aiming at the deficiency of conventional traffic control method, this paper proposes a new method based on multi-agent technology for traffic control. Different from many existing methods, this paper distinguishes tra...Aiming at the deficiency of conventional traffic control method, this paper proposes a new method based on multi-agent technology for traffic control. Different from many existing methods, this paper distinguishes traffic control on the basis of the agent technology from conventional traffic control method. The composition and structure of a multi-agent system (MAS) is first discussed. Then, the step-coordination strategies of intersection-agent, segment-agent, and area-agent are put forward. The advantages of the algorithm are demonstrated by a simulation study.展开更多
The resource constrained project scheduling problem (RCPSP) and a decision-making model based on multi-agent systems (MAS) and general equilibrium marketing are proposed. An algorithm leading to the resource allocatio...The resource constrained project scheduling problem (RCPSP) and a decision-making model based on multi-agent systems (MAS) and general equilibrium marketing are proposed. An algorithm leading to the resource allocation decision involved in RCPSP has also been developed. And this algorithm can be used in the multi-project scheduling field as well.Finally, an illustration is given.展开更多
The paper presents a fuzzy Q-learning(FQL)and optical flow-based autonomous navigation approach.The FQL method takes decisions in an unknown environment and without mapping,using motion information and through a reinf...The paper presents a fuzzy Q-learning(FQL)and optical flow-based autonomous navigation approach.The FQL method takes decisions in an unknown environment and without mapping,using motion information and through a reinforcement signal into an evolutionary algorithm.The reinforcement signal is calculated by estimating the optical flow densities in areas of the camera to determine whether they are“dense”or“thin”which has a relationship with the proximity of objects.The results obtained show that the present approach improves the rate of learning compared with a method with a simple reward system and without the evolutionary component.The proposed system was implemented in a virtual robotics system using the CoppeliaSim software and in communication with Python.展开更多
This paper introduces a multi-agent system which i nt egrates process planning and production scheduling, in order to increase the fle xibility of manufacturing systems in coping with rapid changes in dynamic market a...This paper introduces a multi-agent system which i nt egrates process planning and production scheduling, in order to increase the fle xibility of manufacturing systems in coping with rapid changes in dynamic market and dealing with internal uncertainties such as machine breakdown or resources shortage. This system consists of various autonomous agents, each of which has t he capability of communicating with one another and making decisions based on it s knowledge and if necessary on information provided by other agents. Machine ag ents which represent the machines play an important role in the system in that t hey negotiate with each other to bid for jobs. An iterative bidding mechanism is proposed to facilitate the process of job assignment to machines and handle the negotiation between agents. This mechanism enables near optimal process plans a nd production schedules to be produced concurrently, so that dynamic changes in the market can be coped with at a minimum cost, and the utilisation of manufactu ring resources can be optimised. In addition, a currency scheme with currency-l ike metrics is proposed to encourage or prohibit machine agents to put forward t heir bids for the jobs announced. The values of the metrics are adjusted iterati vely so as to obtain an integrated plan and schedule which result in the minimum total production cost while satisfying products due dates. To deal with the optimisation problem, i.e. to what degree and how the currencies should be adj usted in each iteration, a genetic algorithm (GA) is developed. Comparisons are made between GA approach and simulated annealing (SA) optimisation technique.展开更多
In this paper,a local-learning algorithm for multi-agent is presented based on the fact that individual agent performs local perception and local interaction under group environment.As for in-dividual-learning,agent a...In this paper,a local-learning algorithm for multi-agent is presented based on the fact that individual agent performs local perception and local interaction under group environment.As for in-dividual-learning,agent adopts greedy strategy to maximize its reward when interacting with envi-ronment.In group-learning,local interaction takes place between each two agents.A local-learning algorithm to choose and modify agents' actions is proposed to improve the traditional Q-learning algorithm,respectively in the situations of zero-sum games and general-sum games with unique equi-librium or multi-equilibrium.And this local-learning algorithm is proved to be convergent and the computation complexity is lower than the Nash-Q.Additionally,through grid-game test,it is indicated that by using this local-learning algorithm,the local behaviors of agents can spread to globe.展开更多
We deal with a consensus control problem for a group of third order agents which are networked by digraphs.Assuming that the control input of each agent is constructed based on weighted difference between its states a...We deal with a consensus control problem for a group of third order agents which are networked by digraphs.Assuming that the control input of each agent is constructed based on weighted difference between its states and those of its neighbor agents, we aim to propose an algorithm on computing the weighting coefficients in the control input. The problem is reduced to designing Hurwitz polynomials with real or complex coefficients. We show that by using Hurwitz polynomials with complex coefficients, a necessary and sufficient condition can be obtained for designing the consensus algorithm. Since the condition is both necessary and sufficient, we provide a kind of parametrization for all the weighting coefficients achieving consensus. Moreover, the condition is a natural extension to second order consensus, and is reasonable and practical due to its comparatively decreased computation burden. The result is also extended to the case where communication delay exists in the control input.展开更多
This paper examines a consensus problem in multiagent discrete-time systems, where each agent can exchange information only from its neighbor agents. A decentralized protocol is designed for each agent to steer all ag...This paper examines a consensus problem in multiagent discrete-time systems, where each agent can exchange information only from its neighbor agents. A decentralized protocol is designed for each agent to steer all agents to the same vector. The design condition is expressed in the form of a linear matrix inequality. Finally, a simulation example is presented and a comparison is made to demonstrate the effectiveness of the developed methodology.展开更多
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M...In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.展开更多
Reconfigurability of the electrical network in a shipboard power system (SPS) after its failure is central to the restoration of power supply and improves survivability of an SPS. The navigational process creates a ...Reconfigurability of the electrical network in a shipboard power system (SPS) after its failure is central to the restoration of power supply and improves survivability of an SPS. The navigational process creates a sequence of different operating conditions. The priority of some loads differs in changing operating conditions. After analyzing characteristics of typical SPS, a model was developed used a grade III switchboard and an environmental prioritizing agent (EPA) algorithm. This algorithm was chosen as it is logically and physically decentralized as well as multi-agent oriented. The EPA algorithm was used to decide on the dynamic load priority, then it selected the means to best meet the maximum power supply load. The simulation results showed that higher priority loads were the first to be restored. The system satisfied all necessary constraints, demonstrating the effectiveness and validity of the proposed method.展开更多
In urban Vehicular Ad hoc Networks(VANETs),high mobility of vehicular environment and frequently changed network topology call for a low delay end-to-end routing algorithm.In this paper,we propose a Multi-Agent Reinfo...In urban Vehicular Ad hoc Networks(VANETs),high mobility of vehicular environment and frequently changed network topology call for a low delay end-to-end routing algorithm.In this paper,we propose a Multi-Agent Reinforcement Learning(MARL)based decentralized routing scheme,where the inherent similarity between the routing problem in VANET and the MARL problem is exploited.The proposed routing scheme models the interaction between vehicles and the environment as a multi-agent problem in which each vehicle autonomously establishes the communication channel with a neighbor device regardless of the global information.Simulation performed in the 3GPP Manhattan mobility model demonstrates that our proposed decentralized routing algorithm achieves less than 45.8 ms average latency and high stability of 0.05%averaging failure rate with varying vehicle capacities.展开更多
In order to guarantee the overall production performance of the multiple departments in an air-condition production industry, multidisciplinary design optimization model for production system is established based on t...In order to guarantee the overall production performance of the multiple departments in an air-condition production industry, multidisciplinary design optimization model for production system is established based on the multi-agent technology. Local operation models for departments of plan, marketing, sales, purchasing, as well as production and warehouse are formulated into individual agents, and their respective local objectives are collectively formulated into a multi-objective optimization problem. Considering the coupling effects among the correlated agents, the optimization process is carried out based on self-adaptive chaos immune optimization algorithm with mutative scale. The numerical results indicate that the proposed multi-agent optimization model truly reflects the actual situations of the air-condition production system. The proposed multi-agent based multidisciplinary design optimization method can help companies enhance their income ratio and profit by about 33% and 36%, respectively, and reduce the total cost by about 1.8%.展开更多
Routing plays a critical role in data transmission for underwater acoustic sensor networks(UWSNs)in the internet of underwater things(IoUT).Traditional routing methods suffer from high end-toend delay,limited bandwidt...Routing plays a critical role in data transmission for underwater acoustic sensor networks(UWSNs)in the internet of underwater things(IoUT).Traditional routing methods suffer from high end-toend delay,limited bandwidth,and high energy consumption.With the development of artificial intelligence and machine learning algorithms,many researchers apply these new methods to improve the quality of routing.In this paper,we propose a Qlearning-based multi-hop cooperative routing protocol(QMCR)for UWSNs.Our protocol can automatically choose nodes with the maximum Q-value as forwarders based on distance information.Moreover,we combine cooperative communications with Q-learning algorithm to reduce network energy consumption and improve communication efficiency.Experimental results show that the running time of the QMCR is less than one-tenth of that of the artificial fish-swarm algorithm(AFSA),while the routing energy consumption is kept at the same level.Due to the extremely fast speed of the algorithm,the QMCR is a promising method of routing design for UWSNs,especially for the case that it suffers from the extreme dynamic underwater acoustic channels in the real ocean environment.展开更多
Overhead-hoist-transporters (OHTs) have become the most appropriate tools to transport wafer lots between inter-bay and intra-bay in united layouts of automated material handling systems (AMHSs) in 300 mm semiconducto...Overhead-hoist-transporters (OHTs) have become the most appropriate tools to transport wafer lots between inter-bay and intra-bay in united layouts of automated material handling systems (AMHSs) in 300 mm semiconductor wafer fabrications. To obtain a conflict-free scheduling solution, an intelligent multi-agent-based control system framework was built to support the AMHSs. And corresponding algorithms and rules were proposed to implement cooperation among agents. On the basis of the mentioned above, a time-constraint-based heuristic scheduling algorithm was presented to support the routing decision agent in searching the conflict-free shortest path. In the construction of the algorithm, the conflicted intervals of the k-shortest-route were identified with the time window theory. The most available path was chosen with an objective of the minimum completion time. The back tracking method was combined to finish the routing scheduling. Finally, experiments of the proposed method were simulated. The results show that the multi-agent framework is suitable and the proposed scheduling algorithm is feasible and valid.展开更多
Device-to-Device(D2D)communication is a promising technology that can reduce the burden on cellular networks while increasing network capacity.In this paper,we focus on the channel resource allocation and power contro...Device-to-Device(D2D)communication is a promising technology that can reduce the burden on cellular networks while increasing network capacity.In this paper,we focus on the channel resource allocation and power control to improve the system resource utilization and network throughput.Firstly,we treat each D2D pair as an independent agent.Each agent makes decisions based on the local channel states information observed by itself.The multi-agent Reinforcement Learning(RL)algorithm is proposed for our multi-user system.We assume that the D2D pair do not possess any information on the availability and quality of the resource block to be selected,so the problem is modeled as a stochastic non-cooperative game.Hence,each agent becomes a player and they make decisions together to achieve global optimization.Thereby,the multi-agent Q-learning algorithm based on game theory is established.Secondly,in order to accelerate the convergence rate of multi-agent Q-learning,we consider a power allocation strategy based on Fuzzy C-means(FCM)algorithm.The strategy firstly groups the D2D users by FCM,and treats each group as an agent,and then performs multi-agent Q-learning algorithm to determine the power for each group of D2D users.The simulation results show that the Q-learning algorithm based on multi-agent can improve the throughput of the system.In particular,FCM can greatly speed up the convergence of the multi-agent Q-learning algorithm while improving system throughput.展开更多
It is important to harmonize effectively the behaviors of the agents in the multi-agent system (MAS) to complete the solution process. The co-evolution computing techniques, inspired by natural selection and genetics,...It is important to harmonize effectively the behaviors of the agents in the multi-agent system (MAS) to complete the solution process. The co-evolution computing techniques, inspired by natural selection and genetics, are usually used to solve these problems. Based on learning and evolution mechanisms of the biological systems, an adaptive co-evolution model was proposed in this paper. Inner-population, inter-population, and community learning operators were presented. The adaptive co-evolution algorithm (ACEA) was designed in detail. Some simulation experiments were done to evaluate the performance of the ACEA. The results show that the ACEA is more effective and feasible than the genetic algorithm to solve the optimization problems.展开更多
基金This project was funded by Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah underGrant No.(IFPIP-1127-611-1443)the authors,therefore,acknowledge with thanks DSR technical and financial support.
文摘In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess.
基金This research was funded by the Project of the National Natural Science Foundation of China,Grant Number 62106283.
文摘Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinforce-ment learning(DRL)theory and an improved Multi-Agent Deep Deterministic Policy Gradient(MADDPG-D2)algorithm with a dual experience replay pool and a dual noise based on multi-agent architecture is proposed to improve the efficiency of DTA.The algorithm is based on the traditional Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,and considers the introduction of a double noise mechanism to increase the action exploration space in the early stage of the algorithm,and the introduction of a double experience pool to improve the data utilization rate;at the same time,in order to accelerate the training speed and efficiency of the agents,and to solve the cold-start problem of the training,the a priori knowledge technology is applied to the training of the algorithm.Finally,the MADDPG-D2 algorithm is compared and analyzed based on the digital battlefield of ground and air confrontation.The experimental results show that the agents trained by the MADDPG-D2 algorithm have higher win rates and average rewards,can utilize the resources more reasonably,and better solve the problem of the traditional single agent algorithms facing the difficulty of solving the problem in the high-dimensional decision space.The MADDPG-D2 algorithm based on multi-agent architecture proposed in this paper has certain superiority and rationality in DTA.
文摘Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynamic environments.To reduce the overhead cost,we propose a multi-user beam tracking algorithm using a distributed deep Q-learning method.With online learning of users’moving trajectories,the proposed algorithm learns to scan a beam subspace to maximize the average effective sum rate.Considering practical implementation,we model the continuous beam tracking problem as a non-Markov decision process and thus develop a simplified training scheme of deep Q-learning to reduce the training complexity.Furthermore,we propose a scalable state-action-reward design for scenarios with different users and antenna numbers.Simulation results verify the effectiveness of the designed method.
基金The work is partially supported by Natural Science Foundation of Ningxia(Grant No.AAC03300)National Natural Science Foundation of China(Grant No.61962001)Graduate Innovation Project of North Minzu University(Grant No.YCX23152).
文摘Model checking is an automated formal verification method to verify whether epistemic multi-agent systems adhere to property specifications.Although there is an extensive literature on qualitative properties such as safety and liveness,there is still a lack of quantitative and uncertain property verifications for these systems.In uncertain environments,agents must make judicious decisions based on subjective epistemic.To verify epistemic and measurable properties in multi-agent systems,this paper extends fuzzy computation tree logic by introducing epistemic modalities and proposing a new Fuzzy Computation Tree Logic of Knowledge(FCTLK).We represent fuzzy multi-agent systems as distributed knowledge bases with fuzzy epistemic interpreted systems.In addition,we provide a transformation algorithm from fuzzy epistemic interpreted systems to fuzzy Kripke structures,as well as transformation rules from FCTLK formulas to Fuzzy Computation Tree Logic(FCTL)formulas.Accordingly,we transform the FCTLK model checking problem into the FCTL model checking.This enables the verification of FCTLK formulas by using the fuzzy model checking algorithm of FCTL without additional computational overheads.Finally,we present correctness proofs and complexity analyses of the proposed algorithms.Additionally,we further illustrate the practical application of our approach through an example of a train control system.
文摘In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents’ behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by joint-action. In this article, a prediction-based reinforcement learning algorithm is presented for multi-agent cooperation tasks, which demands all agents to learn predicting the probabilities of actions that other agents may execute. A multi-robot cooperation experiment is run to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation policy much faster than the primitive reinforcement learning algorithm.
文摘Aiming at the deficiency of conventional traffic control method, this paper proposes a new method based on multi-agent technology for traffic control. Different from many existing methods, this paper distinguishes traffic control on the basis of the agent technology from conventional traffic control method. The composition and structure of a multi-agent system (MAS) is first discussed. Then, the step-coordination strategies of intersection-agent, segment-agent, and area-agent are put forward. The advantages of the algorithm are demonstrated by a simulation study.
文摘The resource constrained project scheduling problem (RCPSP) and a decision-making model based on multi-agent systems (MAS) and general equilibrium marketing are proposed. An algorithm leading to the resource allocation decision involved in RCPSP has also been developed. And this algorithm can be used in the multi-project scheduling field as well.Finally, an illustration is given.
文摘The paper presents a fuzzy Q-learning(FQL)and optical flow-based autonomous navigation approach.The FQL method takes decisions in an unknown environment and without mapping,using motion information and through a reinforcement signal into an evolutionary algorithm.The reinforcement signal is calculated by estimating the optical flow densities in areas of the camera to determine whether they are“dense”or“thin”which has a relationship with the proximity of objects.The results obtained show that the present approach improves the rate of learning compared with a method with a simple reward system and without the evolutionary component.The proposed system was implemented in a virtual robotics system using the CoppeliaSim software and in communication with Python.
文摘This paper introduces a multi-agent system which i nt egrates process planning and production scheduling, in order to increase the fle xibility of manufacturing systems in coping with rapid changes in dynamic market and dealing with internal uncertainties such as machine breakdown or resources shortage. This system consists of various autonomous agents, each of which has t he capability of communicating with one another and making decisions based on it s knowledge and if necessary on information provided by other agents. Machine ag ents which represent the machines play an important role in the system in that t hey negotiate with each other to bid for jobs. An iterative bidding mechanism is proposed to facilitate the process of job assignment to machines and handle the negotiation between agents. This mechanism enables near optimal process plans a nd production schedules to be produced concurrently, so that dynamic changes in the market can be coped with at a minimum cost, and the utilisation of manufactu ring resources can be optimised. In addition, a currency scheme with currency-l ike metrics is proposed to encourage or prohibit machine agents to put forward t heir bids for the jobs announced. The values of the metrics are adjusted iterati vely so as to obtain an integrated plan and schedule which result in the minimum total production cost while satisfying products due dates. To deal with the optimisation problem, i.e. to what degree and how the currencies should be adj usted in each iteration, a genetic algorithm (GA) is developed. Comparisons are made between GA approach and simulated annealing (SA) optimisation technique.
文摘In this paper,a local-learning algorithm for multi-agent is presented based on the fact that individual agent performs local perception and local interaction under group environment.As for in-dividual-learning,agent adopts greedy strategy to maximize its reward when interacting with envi-ronment.In group-learning,local interaction takes place between each two agents.A local-learning algorithm to choose and modify agents' actions is proposed to improve the traditional Q-learning algorithm,respectively in the situations of zero-sum games and general-sum games with unique equi-librium or multi-equilibrium.And this local-learning algorithm is proved to be convergent and the computation complexity is lower than the Nash-Q.Additionally,through grid-game test,it is indicated that by using this local-learning algorithm,the local behaviors of agents can spread to globe.
基金supported by Japan Ministry of Education,Sciences and Culture(C21560471)the National Natural Science Foundation of China(61603268)+1 种基金the Research Project Supported by Shanxi Scholarship Council of China(2015-044)the Fundamental Research Project of Shanxi Province(2015021085)
文摘We deal with a consensus control problem for a group of third order agents which are networked by digraphs.Assuming that the control input of each agent is constructed based on weighted difference between its states and those of its neighbor agents, we aim to propose an algorithm on computing the weighting coefficients in the control input. The problem is reduced to designing Hurwitz polynomials with real or complex coefficients. We show that by using Hurwitz polynomials with complex coefficients, a necessary and sufficient condition can be obtained for designing the consensus algorithm. Since the condition is both necessary and sufficient, we provide a kind of parametrization for all the weighting coefficients achieving consensus. Moreover, the condition is a natural extension to second order consensus, and is reasonable and practical due to its comparatively decreased computation burden. The result is also extended to the case where communication delay exists in the control input.
基金supported by Deanship of Scientific research(CDSR)at KFUPM(RG-1316-1)
文摘This paper examines a consensus problem in multiagent discrete-time systems, where each agent can exchange information only from its neighbor agents. A decentralized protocol is designed for each agent to steer all agents to the same vector. The design condition is expressed in the form of a linear matrix inequality. Finally, a simulation example is presented and a comparison is made to demonstrate the effectiveness of the developed methodology.
基金supported by the National Key R&D Program of China(2017YFB1400105).
文摘In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.
基金Supported by the National Natural Science Foundation of China under Grant No.60704004the Fundamental Research Funds for the Central University under Grant No.HEUCFT1005
文摘Reconfigurability of the electrical network in a shipboard power system (SPS) after its failure is central to the restoration of power supply and improves survivability of an SPS. The navigational process creates a sequence of different operating conditions. The priority of some loads differs in changing operating conditions. After analyzing characteristics of typical SPS, a model was developed used a grade III switchboard and an environmental prioritizing agent (EPA) algorithm. This algorithm was chosen as it is logically and physically decentralized as well as multi-agent oriented. The EPA algorithm was used to decide on the dynamic load priority, then it selected the means to best meet the maximum power supply load. The simulation results showed that higher priority loads were the first to be restored. The system satisfied all necessary constraints, demonstrating the effectiveness and validity of the proposed method.
基金This work is supported by the National Science Foundation of China under grant No.61901403,61790551,and 61925106,Youth Innovation Fund of Xiamen No.3502Z20206039 and Tsinghua-Foshan Innovation Special Fund(TFISF)No.2020THFS0109.
文摘In urban Vehicular Ad hoc Networks(VANETs),high mobility of vehicular environment and frequently changed network topology call for a low delay end-to-end routing algorithm.In this paper,we propose a Multi-Agent Reinforcement Learning(MARL)based decentralized routing scheme,where the inherent similarity between the routing problem in VANET and the MARL problem is exploited.The proposed routing scheme models the interaction between vehicles and the environment as a multi-agent problem in which each vehicle autonomously establishes the communication channel with a neighbor device regardless of the global information.Simulation performed in the 3GPP Manhattan mobility model demonstrates that our proposed decentralized routing algorithm achieves less than 45.8 ms average latency and high stability of 0.05%averaging failure rate with varying vehicle capacities.
基金Project(60973132)supported by the National Natural Science Foundation of ChinaProject(2010B050400005)supported by the Science and Research Program of Guangdong Province,China
文摘In order to guarantee the overall production performance of the multiple departments in an air-condition production industry, multidisciplinary design optimization model for production system is established based on the multi-agent technology. Local operation models for departments of plan, marketing, sales, purchasing, as well as production and warehouse are formulated into individual agents, and their respective local objectives are collectively formulated into a multi-objective optimization problem. Considering the coupling effects among the correlated agents, the optimization process is carried out based on self-adaptive chaos immune optimization algorithm with mutative scale. The numerical results indicate that the proposed multi-agent optimization model truly reflects the actual situations of the air-condition production system. The proposed multi-agent based multidisciplinary design optimization method can help companies enhance their income ratio and profit by about 33% and 36%, respectively, and reduce the total cost by about 1.8%.
基金the National Key Research and Development Program of China under Grant No.2016YFC1400200in part by the Basic Research Program of Science and Technology of Shenzhen,China under Grant No.JCYJ20190809161805508+2 种基金in part by the Fundamental Research Funds for the Central Universities of China under Grant No.20720200092in part by the Xiamen University’s Honors Program for Undergraduates in Marine Sciences under Grant No.22320152201106in part by the National Natural Science Foundation of China under Grants No.41476026,41976178 and 61801139。
文摘Routing plays a critical role in data transmission for underwater acoustic sensor networks(UWSNs)in the internet of underwater things(IoUT).Traditional routing methods suffer from high end-toend delay,limited bandwidth,and high energy consumption.With the development of artificial intelligence and machine learning algorithms,many researchers apply these new methods to improve the quality of routing.In this paper,we propose a Qlearning-based multi-hop cooperative routing protocol(QMCR)for UWSNs.Our protocol can automatically choose nodes with the maximum Q-value as forwarders based on distance information.Moreover,we combine cooperative communications with Q-learning algorithm to reduce network energy consumption and improve communication efficiency.Experimental results show that the running time of the QMCR is less than one-tenth of that of the artificial fish-swarm algorithm(AFSA),while the routing energy consumption is kept at the same level.Due to the extremely fast speed of the algorithm,the QMCR is a promising method of routing design for UWSNs,especially for the case that it suffers from the extreme dynamic underwater acoustic channels in the real ocean environment.
基金National Natural Science Foundations of China(No. 61273035,No. 71071115)National High-Tech R&D Program for CIMS,China(No. 2009AA043000)
文摘Overhead-hoist-transporters (OHTs) have become the most appropriate tools to transport wafer lots between inter-bay and intra-bay in united layouts of automated material handling systems (AMHSs) in 300 mm semiconductor wafer fabrications. To obtain a conflict-free scheduling solution, an intelligent multi-agent-based control system framework was built to support the AMHSs. And corresponding algorithms and rules were proposed to implement cooperation among agents. On the basis of the mentioned above, a time-constraint-based heuristic scheduling algorithm was presented to support the routing decision agent in searching the conflict-free shortest path. In the construction of the algorithm, the conflicted intervals of the k-shortest-route were identified with the time window theory. The most available path was chosen with an objective of the minimum completion time. The back tracking method was combined to finish the routing scheduling. Finally, experiments of the proposed method were simulated. The results show that the multi-agent framework is suitable and the proposed scheduling algorithm is feasible and valid.
基金This work was supported by the National Natural Science Foundation of China(61871058)Key Special Project in Intergovernmental International Scientific and Technological Innovation Cooperation of National Key Research and Development Program(2017YFE0118600).
文摘Device-to-Device(D2D)communication is a promising technology that can reduce the burden on cellular networks while increasing network capacity.In this paper,we focus on the channel resource allocation and power control to improve the system resource utilization and network throughput.Firstly,we treat each D2D pair as an independent agent.Each agent makes decisions based on the local channel states information observed by itself.The multi-agent Reinforcement Learning(RL)algorithm is proposed for our multi-user system.We assume that the D2D pair do not possess any information on the availability and quality of the resource block to be selected,so the problem is modeled as a stochastic non-cooperative game.Hence,each agent becomes a player and they make decisions together to achieve global optimization.Thereby,the multi-agent Q-learning algorithm based on game theory is established.Secondly,in order to accelerate the convergence rate of multi-agent Q-learning,we consider a power allocation strategy based on Fuzzy C-means(FCM)algorithm.The strategy firstly groups the D2D users by FCM,and treats each group as an agent,and then performs multi-agent Q-learning algorithm to determine the power for each group of D2D users.The simulation results show that the Q-learning algorithm based on multi-agent can improve the throughput of the system.In particular,FCM can greatly speed up the convergence of the multi-agent Q-learning algorithm while improving system throughput.
基金Project of Shanghai Committee of Science and Technology, China ( No.08JC1400100, No. QB081404100)Leading Academic Discipline Project of Shanghai Municipal Education Commission, China (No.J51901)
文摘It is important to harmonize effectively the behaviors of the agents in the multi-agent system (MAS) to complete the solution process. The co-evolution computing techniques, inspired by natural selection and genetics, are usually used to solve these problems. Based on learning and evolution mechanisms of the biological systems, an adaptive co-evolution model was proposed in this paper. Inner-population, inter-population, and community learning operators were presented. The adaptive co-evolution algorithm (ACEA) was designed in detail. Some simulation experiments were done to evaluate the performance of the ACEA. The results show that the ACEA is more effective and feasible than the genetic algorithm to solve the optimization problems.