Dynamic channel assignment(DCA)is significant for extending vehicular ad hoc network(VANET)capacity and mitigating congestion.However,the un-known global state information and the lack of centralized control make chan...Dynamic channel assignment(DCA)is significant for extending vehicular ad hoc network(VANET)capacity and mitigating congestion.However,the un-known global state information and the lack of centralized control make channel assignment performances a challenging task in a distributed vehicular direct communication scenario.In our preliminary field test for communication under V2X scenario,we find that the existing DCA technology cannot fully meet the communication performance requirements of VANET.In order to improve the communication performance,we firstly demonstrate the feasibility and potential of reinforcement learning(RL)method in joint channel selection decision and access fallback adaptation design in this paper.Besides,a dual reinforcement learning(DRL)-based cooperative DCA(DRL-CDCA)mechanism is proposed.Specifically,DRL-CDCA jointly optimizes the decision-making behaviors of both the channel selection and back-off adaptation based on a multi-agent dual reinforcement learning framework.Besides,nodes locally share and incorporate their individual rewards after each communication to achieve regional consistency optimization.Simulation results show that the proposed DRL-CDCA can better reduce the one-hop packet delay,improve the packet delivery ratio on average when compared with two other existing mechanisms.展开更多
Unmanned aerial vehicle(UAV)-assisted communications have been considered as a solution of aerial networking in future wireless networks due to its low-cost, high-mobility, and swift features. This paper considers a U...Unmanned aerial vehicle(UAV)-assisted communications have been considered as a solution of aerial networking in future wireless networks due to its low-cost, high-mobility, and swift features. This paper considers a UAV-assisted downlink transmission,where UAVs are deployed as aerial base stations to serve ground users. To maximize the average transmission rate among the ground users, this paper formulates a joint optimization problem of UAV trajectory design and channel selection, which is NP-hard and non-convex. To solve the problem, we propose a multi-agent deep Q-network(MADQN) scheme.Specifically, the agents that the UAVs act as perform actions from their observations distributively and share the same reward. To tackle the tasks where the experience is insufficient, we propose a multi-agent meta reinforcement learning algorithm to fast adapt to the new tasks. By pretraining the tasks with similar distribution, the learning model can acquire general knowledge. Simulation results have indicated the MADQN scheme can achieve higher throughput than fixed allocation. Furthermore, our proposed multiagent meta reinforcement learning algorithm learns the new tasks much faster compared with the MADQN scheme.展开更多
Formany years,researchers have explored power allocation(PA)algorithms driven bymodels in wireless networks where multiple-user communications with interference are present.Nowadays,data-driven machine learning method...Formany years,researchers have explored power allocation(PA)algorithms driven bymodels in wireless networks where multiple-user communications with interference are present.Nowadays,data-driven machine learning methods have become quite popular in analyzing wireless communication systems,which among them deep reinforcement learning(DRL)has a significant role in solving optimization issues under certain constraints.To this purpose,in this paper,we investigate the PA problem in a k-user multiple access channels(MAC),where k transmitters(e.g.,mobile users)aim to send an independent message to a common receiver(e.g.,base station)through wireless channels.To this end,we first train the deep Q network(DQN)with a deep Q learning(DQL)algorithm over the simulation environment,utilizing offline learning.Then,the DQN will be used with the real data in the online training method for the PA issue by maximizing the sumrate subjected to the source power.Finally,the simulation results indicate that our proposedDQNmethod provides better performance in terms of the sumrate compared with the available DQL training approaches such as fractional programming(FP)and weighted minimum mean squared error(WMMSE).Additionally,by considering different user densities,we show that our proposed DQN outperforms benchmark algorithms,thereby,a good generalization ability is verified over wireless multi-user communication systems.展开更多
Mobility support to change the connection from one access point(AP)to the next(i.e.,handover)becomes one of the important issues in IEEE 802.11 wireless local area networks(WLANs).During handover,the channel scanning ...Mobility support to change the connection from one access point(AP)to the next(i.e.,handover)becomes one of the important issues in IEEE 802.11 wireless local area networks(WLANs).During handover,the channel scanning procedure,which aims to collect neighbor AP(NAP)information on all available channels,accounts for most of the delay time.To reduce the channel scanning procedure,a neighbor beacon frame transmission scheme(N-BTS)was proposed for a seamless handover.N-BTS can provide a seamless handover by removing the channel scanning procedure.However,N-BTS always requires operating overhead even if there are few mobile stations(MSs)for the handover.Therefore,this paper proposes a reinforcement learning-based handover scheme with neighbor beacon frame transmission(MAN-BTS)to properly consider the use of N-BTS.The optimization equation is defined to maximize the expected reward tofind the optimal policy and is solved using Q-learning.Simulation results show that the proposed scheme outperforms the comparison schemes in terms of the expected reward.展开更多
The sum rate maximization beamforming problem for a multi-cell multi-user multiple-input single-output interference channel(MISO-IC)system is considered.Conventionally,the centralized and distributed beamforming solut...The sum rate maximization beamforming problem for a multi-cell multi-user multiple-input single-output interference channel(MISO-IC)system is considered.Conventionally,the centralized and distributed beamforming solutions to the MISO-IC system have high computational complexity and bear a heavy burden of channel state information exchange between base stations(BSs),which becomes even much worse in a large-scale antenna system.To address this,we propose a distributed deep reinforcement learning(DRL)based approach with lim⁃ited information exchange.Specifically,the original beamforming problem is decomposed of the problems of beam direction design and power allocation and the costs of information exchange between BSs are significantly reduced.In particular,each BS is provided with an inde⁃pendent deep deterministic policy gradient network that can learn to choose the beam direction scheme and simultaneously allocate power to users.Simulation results illustrate that the proposed DRL-based approach has comparable sum rate performance with much less information exchange over the conventional distributed beamforming solutions.展开更多
Dynamic channel assignment(DCA)plays a key role in extending vehicular ad-hoc network capacity and mitigating congestion.However,channel assignment under vehicular direct communication scenarios faces mutual influence...Dynamic channel assignment(DCA)plays a key role in extending vehicular ad-hoc network capacity and mitigating congestion.However,channel assignment under vehicular direct communication scenarios faces mutual influence of large-scale nodes,the lack of centralized coordination,unknown global state information,and other challenges.To solve this problem,a multiagent reinforcement learning(RL)based cooperative DCA(RLCDCA)mechanism is proposed.Specifically,each vehicular node can successfully learn the proper strategies of channel selection and backoff adaptation from the real-time channel state information(CSI)using two cooperative RL models.In addition,neural networks are constructed as nonlinear Q-function approximators,which facilitates the mapping of the continuously sensed input to the mixed policy output.Nodes are driven to locally share and incorporate their individual rewards such that they can optimize their policies in a distributed collaborative manner.Simulation results show that the proposed multiagent RL-CDCA can better reduce the one-hop packet delay by no less than 73.73%,improve the packet delivery ratio by no less than 12.66%on average in a highly dense situation,and improve the fairness of the global network resource allocation.展开更多
基金Beijing Municipal Natural Science Foundation Nos.L191001 and 4181002the National Natural Science Foundation of China under Grant Nos.61672082 and 61822101the Newton Advanced Fellow-ship under Grant No.62061130221.
文摘Dynamic channel assignment(DCA)is significant for extending vehicular ad hoc network(VANET)capacity and mitigating congestion.However,the un-known global state information and the lack of centralized control make channel assignment performances a challenging task in a distributed vehicular direct communication scenario.In our preliminary field test for communication under V2X scenario,we find that the existing DCA technology cannot fully meet the communication performance requirements of VANET.In order to improve the communication performance,we firstly demonstrate the feasibility and potential of reinforcement learning(RL)method in joint channel selection decision and access fallback adaptation design in this paper.Besides,a dual reinforcement learning(DRL)-based cooperative DCA(DRL-CDCA)mechanism is proposed.Specifically,DRL-CDCA jointly optimizes the decision-making behaviors of both the channel selection and back-off adaptation based on a multi-agent dual reinforcement learning framework.Besides,nodes locally share and incorporate their individual rewards after each communication to achieve regional consistency optimization.Simulation results show that the proposed DRL-CDCA can better reduce the one-hop packet delay,improve the packet delivery ratio on average when compared with two other existing mechanisms.
基金supported in part by the National Nature Science Foundation of China under Grant 62131005 and U19B2014in part by the National Key Research and Development Program of China under Grant 254。
文摘Unmanned aerial vehicle(UAV)-assisted communications have been considered as a solution of aerial networking in future wireless networks due to its low-cost, high-mobility, and swift features. This paper considers a UAV-assisted downlink transmission,where UAVs are deployed as aerial base stations to serve ground users. To maximize the average transmission rate among the ground users, this paper formulates a joint optimization problem of UAV trajectory design and channel selection, which is NP-hard and non-convex. To solve the problem, we propose a multi-agent deep Q-network(MADQN) scheme.Specifically, the agents that the UAVs act as perform actions from their observations distributively and share the same reward. To tackle the tasks where the experience is insufficient, we propose a multi-agent meta reinforcement learning algorithm to fast adapt to the new tasks. By pretraining the tasks with similar distribution, the learning model can acquire general knowledge. Simulation results have indicated the MADQN scheme can achieve higher throughput than fixed allocation. Furthermore, our proposed multiagent meta reinforcement learning algorithm learns the new tasks much faster compared with the MADQN scheme.
文摘Formany years,researchers have explored power allocation(PA)algorithms driven bymodels in wireless networks where multiple-user communications with interference are present.Nowadays,data-driven machine learning methods have become quite popular in analyzing wireless communication systems,which among them deep reinforcement learning(DRL)has a significant role in solving optimization issues under certain constraints.To this purpose,in this paper,we investigate the PA problem in a k-user multiple access channels(MAC),where k transmitters(e.g.,mobile users)aim to send an independent message to a common receiver(e.g.,base station)through wireless channels.To this end,we first train the deep Q network(DQN)with a deep Q learning(DQL)algorithm over the simulation environment,utilizing offline learning.Then,the DQN will be used with the real data in the online training method for the PA issue by maximizing the sumrate subjected to the source power.Finally,the simulation results indicate that our proposedDQNmethod provides better performance in terms of the sumrate compared with the available DQL training approaches such as fractional programming(FP)and weighted minimum mean squared error(WMMSE).Additionally,by considering different user densities,we show that our proposed DQN outperforms benchmark algorithms,thereby,a good generalization ability is verified over wireless multi-user communication systems.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea Government(MSIT)(No.2020R1G1A1100493).
文摘Mobility support to change the connection from one access point(AP)to the next(i.e.,handover)becomes one of the important issues in IEEE 802.11 wireless local area networks(WLANs).During handover,the channel scanning procedure,which aims to collect neighbor AP(NAP)information on all available channels,accounts for most of the delay time.To reduce the channel scanning procedure,a neighbor beacon frame transmission scheme(N-BTS)was proposed for a seamless handover.N-BTS can provide a seamless handover by removing the channel scanning procedure.However,N-BTS always requires operating overhead even if there are few mobile stations(MSs)for the handover.Therefore,this paper proposes a reinforcement learning-based handover scheme with neighbor beacon frame transmission(MAN-BTS)to properly consider the use of N-BTS.The optimization equation is defined to maximize the expected reward tofind the optimal policy and is solved using Q-learning.Simulation results show that the proposed scheme outperforms the comparison schemes in terms of the expected reward.
基金supported by the joint research project with ZTE Corporation under Grant No.HC-CN-2020120002.
文摘The sum rate maximization beamforming problem for a multi-cell multi-user multiple-input single-output interference channel(MISO-IC)system is considered.Conventionally,the centralized and distributed beamforming solutions to the MISO-IC system have high computational complexity and bear a heavy burden of channel state information exchange between base stations(BSs),which becomes even much worse in a large-scale antenna system.To address this,we propose a distributed deep reinforcement learning(DRL)based approach with lim⁃ited information exchange.Specifically,the original beamforming problem is decomposed of the problems of beam direction design and power allocation and the costs of information exchange between BSs are significantly reduced.In particular,each BS is provided with an inde⁃pendent deep deterministic policy gradient network that can learn to choose the beam direction scheme and simultaneously allocate power to users.Simulation results illustrate that the proposed DRL-based approach has comparable sum rate performance with much less information exchange over the conventional distributed beamforming solutions.
基金Project supported by the National Natural Science Foundation of China(Nos.61672082 and 61822101)the Beijing Municipal Natural Science Foundation,China(No.4181002)the Beihang University Innovation and Practice Fund for Graduate,China(No.YCSJ-02-2018-05)。
文摘Dynamic channel assignment(DCA)plays a key role in extending vehicular ad-hoc network capacity and mitigating congestion.However,channel assignment under vehicular direct communication scenarios faces mutual influence of large-scale nodes,the lack of centralized coordination,unknown global state information,and other challenges.To solve this problem,a multiagent reinforcement learning(RL)based cooperative DCA(RLCDCA)mechanism is proposed.Specifically,each vehicular node can successfully learn the proper strategies of channel selection and backoff adaptation from the real-time channel state information(CSI)using two cooperative RL models.In addition,neural networks are constructed as nonlinear Q-function approximators,which facilitates the mapping of the continuously sensed input to the mixed policy output.Nodes are driven to locally share and incorporate their individual rewards such that they can optimize their policies in a distributed collaborative manner.Simulation results show that the proposed multiagent RL-CDCA can better reduce the one-hop packet delay by no less than 73.73%,improve the packet delivery ratio by no less than 12.66%on average in a highly dense situation,and improve the fairness of the global network resource allocation.