Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o...Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies.展开更多
Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also ...Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also employed in the design of molecules and drugs. While a single agent is a good fit for computer games, it has limitations when used in molecule design. Its sequential learning makes it impossible to modify or improve the previous steps while working on the current step. In this paper, we proposed to apply the multi-agent RL approach to the research of molecules, which can optimize all sites of a molecule simultaneously. To elucidate the validity of our approach, we chose one chemical compound Favipiravir to explore its local chemical space. Favipiravir is a broad-spectrum inhibitor of viral RNA polymerase, and is one of the compounds that are currently being used in SARS-CoV-2 (COVID-19) clinical trials. Our experiments revealed the collaborative learning of a team of deep RL agents as well as the learning of its individual learning agent in the exploration of Favipiravir. In particular, our multi-agents not only discovered the molecules near Favipiravir in chemical space, but also the learnability of each site in the string representation of Favipiravir, critical information for us to understand the underline mechanism that supports machine learning of molecules.展开更多
Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on...Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on multi-agent inverted pendulum is made to test the efficency of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive multiagent reinforcement learning algorithm.展开更多
Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinat...Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinate the actions of multiple agents. However,dense communication among agents affects the practicability of DCOP algorithms. In this paper,we propose a novel DCOP algorithm dealing with the previous DCOP algorithms' communication problem by reducing constraints.The contributions of this paper are primarily threefold:(1) It is proved that removing constraints can effectively reduce the communication burden of DCOP algorithms.(2) An criterion is provided to identify insignificant constraints whose elimination doesn't have a great impact on the performance of the whole system.(3) A constraint-reduced DCOP algorithm is proposed by adopting a variant of spectral clustering algorithm to detect and eliminate the insignificant constraints. Our algorithm reduces the communication burdern of the benchmark DCOP algorithm while keeping its overall performance unaffected. The performance of constraint-reduced DCOP algorithm is evaluated on four configurations of cooperative sensor networks. The effectiveness of communication reduction is also verified by comparisons between the constraint-reduced DCOP and the benchmark DCOP.展开更多
Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but...Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but it is often beneficial to use a function-approximation system, such as deep neural networks, to estimate state values. It has been previously observed that Q-learning can be unstable when using value function approximation or when operating in a stochastic environment. This instability can adversely affect the algorithm’s ability to maximize its returns. In this paper, we present a new algorithm called Multi Q-learning to attempt to overcome the instability seen in Q-learning. We test our algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks. Our results show that in most cases, Multi Q-learning outperforms Q-learning, achieving average returns up to 2.5 times higher than Q-learning and having a standard deviation of state values as low as 0.58.展开更多
Introduces a mixture genetic algorithm and reinforcement learning computation model used for independent agent learning in continuous, distributive, open environment, which takes full advantage of the reactive and rob...Introduces a mixture genetic algorithm and reinforcement learning computation model used for independent agent learning in continuous, distributive, open environment, which takes full advantage of the reactive and robust of reinforcement learning algorithm and the property that genetic algorithm is suitable to the problem with high dimension,large collectivity, complex environment, and concludes that through proper training, the result verifies that this method is available in the complex multi agent environment.展开更多
AGVs dispatching, one of the hot problems in FMS, has attracted widespread interest in recent years. It is hard to dynamically schedule AGVs with pre designed rule because of the uncertainty and dynamic nature of AGVs...AGVs dispatching, one of the hot problems in FMS, has attracted widespread interest in recent years. It is hard to dynamically schedule AGVs with pre designed rule because of the uncertainty and dynamic nature of AGVs dispatching progress, so the AGVs system in this paper is treated as a cooperative learning multiagent system, in which each agent adopts multilevel decision method, which includes two level decisions: the option level and the action level. On the option level, an agent learns a policy to execute a subtask with the best response to the other AGVs’ current options. On the action level, an agent learns an optimal policy of actions for achieving his planned option. The method is applied to a AGVs’ dispatching simulation, and the performance of the AGVs system based on this method is verified.展开更多
A novel centralized approach for Dynamic Spectrum Allocation (DSA) in the Cognitive Radio (CR) network is presented in this paper. Instead of giving the solution in terms of formulas modeling network environment such ...A novel centralized approach for Dynamic Spectrum Allocation (DSA) in the Cognitive Radio (CR) network is presented in this paper. Instead of giving the solution in terms of formulas modeling network environment such as linear programming or convex optimization, the new approach obtains the capability of iteratively on-line learning environment performance by using Reinforcement Learning (RL) algorithm after observing the variability and uncertainty of the heterogeneous wireless networks. Appropriate decision-making access actions can then be obtained by employing Fuzzy Inference System (FIS) which ensures the strategy being able to explore the possible status and exploit the experiences sufficiently. The new approach considers multi-objective such as spectrum efficiency and fairness between CR Access Points (AP) effectively. By interacting with the environment and accumulating comprehensive advantages, it can achieve the largest long-term reward expected on the desired objectives and implement the best action. Moreover, the present algorithm is relatively simple and does not require complex calculations. Simulation results show that the proposed approach can get better performance with respect to fixed frequency planning scheme or general dynamic spectrum allocation policy.展开更多
基金supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.RS-2022-00155885, Artificial Intelligence Convergence Innovation Human Resources Development (Hanyang University ERICA))supported by the National Natural Science Foundation of China under Grant No. 61971264the National Natural Science Foundation of China/Research Grants Council Collaborative Research Scheme under Grant No. 62261160390
文摘Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies.
文摘Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also employed in the design of molecules and drugs. While a single agent is a good fit for computer games, it has limitations when used in molecule design. Its sequential learning makes it impossible to modify or improve the previous steps while working on the current step. In this paper, we proposed to apply the multi-agent RL approach to the research of molecules, which can optimize all sites of a molecule simultaneously. To elucidate the validity of our approach, we chose one chemical compound Favipiravir to explore its local chemical space. Favipiravir is a broad-spectrum inhibitor of viral RNA polymerase, and is one of the compounds that are currently being used in SARS-CoV-2 (COVID-19) clinical trials. Our experiments revealed the collaborative learning of a team of deep RL agents as well as the learning of its individual learning agent in the exploration of Favipiravir. In particular, our multi-agents not only discovered the molecules near Favipiravir in chemical space, but also the learnability of each site in the string representation of Favipiravir, critical information for us to understand the underline mechanism that supports machine learning of molecules.
基金Sponsored bythe Ministerial Level Foundation (70302)
文摘Multl-agent reinforcement learning algorithms are studied. A prediction-based multi-agent reinforcement learning algorithm is presented for multl-robot cooperation task. The multi-robot cooperation experiment based on multi-agent inverted pendulum is made to test the efficency of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation strategy much faster than the primitive multiagent reinforcement learning algorithm.
基金Supported by the National Social Science Foundation of China(15ZDA034,14BZZ028)Beijing Social Science Foundation(16JDGLA036)JKF Program of People’s Public Security University of China(2016JKF01318)
文摘Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinate the actions of multiple agents. However,dense communication among agents affects the practicability of DCOP algorithms. In this paper,we propose a novel DCOP algorithm dealing with the previous DCOP algorithms' communication problem by reducing constraints.The contributions of this paper are primarily threefold:(1) It is proved that removing constraints can effectively reduce the communication burden of DCOP algorithms.(2) An criterion is provided to identify insignificant constraints whose elimination doesn't have a great impact on the performance of the whole system.(3) A constraint-reduced DCOP algorithm is proposed by adopting a variant of spectral clustering algorithm to detect and eliminate the insignificant constraints. Our algorithm reduces the communication burdern of the benchmark DCOP algorithm while keeping its overall performance unaffected. The performance of constraint-reduced DCOP algorithm is evaluated on four configurations of cooperative sensor networks. The effectiveness of communication reduction is also verified by comparisons between the constraint-reduced DCOP and the benchmark DCOP.
文摘Q-learning is a popular temporal-difference reinforcement learning algorithm which often explicitly stores state values using lookup tables. This implementation has been proven to converge to the optimal solution, but it is often beneficial to use a function-approximation system, such as deep neural networks, to estimate state values. It has been previously observed that Q-learning can be unstable when using value function approximation or when operating in a stochastic environment. This instability can adversely affect the algorithm’s ability to maximize its returns. In this paper, we present a new algorithm called Multi Q-learning to attempt to overcome the instability seen in Q-learning. We test our algorithm on a 4 × 4 grid-world with different stochastic reward functions using various deep neural networks and convolutional networks. Our results show that in most cases, Multi Q-learning outperforms Q-learning, achieving average returns up to 2.5 times higher than Q-learning and having a standard deviation of state values as low as 0.58.
基金Supported by National Natural Science Foundation of China(60474035),National Research Foundation for the Doctoral Program of Higher Education of China(20050359004),Natural Science Foundation of Anhui Province(070412035)
文摘Introduces a mixture genetic algorithm and reinforcement learning computation model used for independent agent learning in continuous, distributive, open environment, which takes full advantage of the reactive and robust of reinforcement learning algorithm and the property that genetic algorithm is suitable to the problem with high dimension,large collectivity, complex environment, and concludes that through proper training, the result verifies that this method is available in the complex multi agent environment.
文摘AGVs dispatching, one of the hot problems in FMS, has attracted widespread interest in recent years. It is hard to dynamically schedule AGVs with pre designed rule because of the uncertainty and dynamic nature of AGVs dispatching progress, so the AGVs system in this paper is treated as a cooperative learning multiagent system, in which each agent adopts multilevel decision method, which includes two level decisions: the option level and the action level. On the option level, an agent learns a policy to execute a subtask with the best response to the other AGVs’ current options. On the action level, an agent learns an optimal policy of actions for achieving his planned option. The method is applied to a AGVs’ dispatching simulation, and the performance of the AGVs system based on this method is verified.
基金supported in part by National Science Fund for Distinguished Young Scholars project under Grant No.60725105National Basic Research Program of China (973 Pro-gram) under Grant No.2009CB320404+1 种基金National Natural Science Foundation of China under Grant No.61072068Fundamental Research Funds for the Central Universities under Grant No.JY10000901031
文摘A novel centralized approach for Dynamic Spectrum Allocation (DSA) in the Cognitive Radio (CR) network is presented in this paper. Instead of giving the solution in terms of formulas modeling network environment such as linear programming or convex optimization, the new approach obtains the capability of iteratively on-line learning environment performance by using Reinforcement Learning (RL) algorithm after observing the variability and uncertainty of the heterogeneous wireless networks. Appropriate decision-making access actions can then be obtained by employing Fuzzy Inference System (FIS) which ensures the strategy being able to explore the possible status and exploit the experiences sufficiently. The new approach considers multi-objective such as spectrum efficiency and fairness between CR Access Points (AP) effectively. By interacting with the environment and accumulating comprehensive advantages, it can achieve the largest long-term reward expected on the desired objectives and implement the best action. Moreover, the present algorithm is relatively simple and does not require complex calculations. Simulation results show that the proposed approach can get better performance with respect to fixed frequency planning scheme or general dynamic spectrum allocation policy.