Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o...Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies.展开更多
Asynchronous advantage actor‐critic(A3C)algorithm is a commonly used policy opti-mization algorithm in reinforcement learning,in which asynchronous is parallel inter-active sampling and training,and advantage is a sa...Asynchronous advantage actor‐critic(A3C)algorithm is a commonly used policy opti-mization algorithm in reinforcement learning,in which asynchronous is parallel inter-active sampling and training,and advantage is a sampling multi‐step reward estimation method for computing weights.In order to address the problem of low efficiency and insufficient convergence caused by the traditional heuristic exploration of A3C algorithm in reinforcement learning,an improved A3C algorithm is proposed in this paper.In this algorithm,a noise network function,which updates the noise tensor in an explicit way is constructed to train the agent.Generalised advantage estimation(GAE)is also adopted to describe the dominance function.Finally,a new mean gradient parallelisation method is designed to update the parameters in both the primary and secondary networks by summing and averaging the gradients passed from all the sub‐processes to the main process.Simulation experiments were conducted in a gym environment using the PyTorch Agent Net(PTAN)advanced reinforcement learning library,and the results show that the method enables the agent to complete the learning training faster and its convergence during the training process is better.The improved A3C algorithm has a better performance than the original algorithm,which can provide new ideas for sub-sequent research on reinforcement learning algorithms.展开更多
基金supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.RS-2022-00155885, Artificial Intelligence Convergence Innovation Human Resources Development (Hanyang University ERICA))supported by the National Natural Science Foundation of China under Grant No. 61971264the National Natural Science Foundation of China/Research Grants Council Collaborative Research Scheme under Grant No. 62261160390
文摘Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies.
基金Natural Science Foundation of Zhejiang Province,Grant/Award Number:LQ15F030006Key Research and Development Program of Zhejiang Province,Grant/Award Number:2018C01085。
文摘Asynchronous advantage actor‐critic(A3C)algorithm is a commonly used policy opti-mization algorithm in reinforcement learning,in which asynchronous is parallel inter-active sampling and training,and advantage is a sampling multi‐step reward estimation method for computing weights.In order to address the problem of low efficiency and insufficient convergence caused by the traditional heuristic exploration of A3C algorithm in reinforcement learning,an improved A3C algorithm is proposed in this paper.In this algorithm,a noise network function,which updates the noise tensor in an explicit way is constructed to train the agent.Generalised advantage estimation(GAE)is also adopted to describe the dominance function.Finally,a new mean gradient parallelisation method is designed to update the parameters in both the primary and secondary networks by summing and averaging the gradients passed from all the sub‐processes to the main process.Simulation experiments were conducted in a gym environment using the PyTorch Agent Net(PTAN)advanced reinforcement learning library,and the results show that the method enables the agent to complete the learning training faster and its convergence during the training process is better.The improved A3C algorithm has a better performance than the original algorithm,which can provide new ideas for sub-sequent research on reinforcement learning algorithms.