To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q...To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is associated.In the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until convergence.In the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user distribution.Simulations demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional schemes.Meanwhile,performance metrics,such as capacity and fairness,can be guaranteed.展开更多
Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environmen...Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.展开更多
Collaborative vehicular networks is a key enabler to meet the stringent ultra-reliable and lowlatency communications(URLLC)requirements.A user vehicle(UV)dynamically optimizes task offloading by exploiting its collabo...Collaborative vehicular networks is a key enabler to meet the stringent ultra-reliable and lowlatency communications(URLLC)requirements.A user vehicle(UV)dynamically optimizes task offloading by exploiting its collaborations with edge servers and vehicular fog servers(VFSs).However,the optimization of task offloading in highly dynamic collaborative vehicular networks faces several challenges such as URLLC guaranteeing,incomplete information,and dimensionality curse.In this paper,we first characterize URLLC in terms of queuing delay bound violation and high-order statistics of excess backlogs.Then,a Deep Reinforcement lEarning-based URLLCAware task offloading algorithM named DREAM is proposed to maximize the throughput of the UVs while satisfying the URLLC constraints in a besteffort way.Compared with existing task offloading algorithms,DREAM achieves superior performance in throughput,queuing delay,and URLLC.展开更多
The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a promi...The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.展开更多
容器云系统的资源调度策略对资源利用率和集群性能起着重要作用。现有的容器集群调度没有充分考虑节点内部和节点之间的资源占用情况,容易出现容器资源瓶颈,造成资源利用率低和服务可靠性差的问题。为了均衡容器集群的工作负载,减少容...容器云系统的资源调度策略对资源利用率和集群性能起着重要作用。现有的容器集群调度没有充分考虑节点内部和节点之间的资源占用情况,容易出现容器资源瓶颈,造成资源利用率低和服务可靠性差的问题。为了均衡容器集群的工作负载,减少容器资源瓶颈的出现,提出了一种基于DQN(Deep Q-learning Network)的容器集群调度优化算法CS-DQN(Container Scheduling Optimization Strategy Based on DQN)。首先提出一种面向负载均衡的容器集群资源利用率优化模型。然后利用深度强化学习方法,设计一种基于DQN的容器集群调度算法,定义相关的状态空间、动作空间和奖励函数。通过引入改进的DQN算法,基于自学习方法生成满足优化目标的容器动态调度策略。实验结果表明,该调度策略扩大了在调度中可部署容器的规模,在不同的工作负载中实现了较好的负载均衡,提高了资源利用率,更好地保证了服务可靠性。展开更多
It is essential to maximize capacity while satisfying the transmission time delay of unmanned aerial vehicle(UAV)swarm communication system.In order to address this challenge,a dynamic decentralized optimization mecha...It is essential to maximize capacity while satisfying the transmission time delay of unmanned aerial vehicle(UAV)swarm communication system.In order to address this challenge,a dynamic decentralized optimization mechanism is presented for the realization of joint spectrum and power(JSAP)resource allocation based on deep Q-learning networks(DQNs).Each UAV to UAV(U2U)link is regarded as an agent that is capable of identifying the optimal spectrum and power to communicate with one another.The convolutional neural network,target network,and experience replay are adopted while training.The findings of the simulation indicate that the proposed method has the potential to improve both communication capacity and probability of successful data transmission when compared with random centralized assignment and multichannel access methods.展开更多
A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture ...A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.展开更多
基金This work was supported by the Fundamental Research Funds for the Central Universities of China under grant no.PA2019GDQT0012by National Natural Science Foundation of China(Grant No.61971176)by the Applied Basic Research Program ofWuhan City,China,under grand 2017010201010117.
文摘To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is associated.In the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until convergence.In the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user distribution.Simulations demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional schemes.Meanwhile,performance metrics,such as capacity and fairness,can be guaranteed.
基金Authors would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work under Project No.g01/n04.
文摘Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.
基金This work was partially supported by the Open Funding of the Shaanxi Key Laboratory of Intelligent Processing for Big Energy Data under Grant Number IPBED3supported by the National Natural Science Foundation of China(NSFC)under Grant Number 61971189supported by the Fundamental Research Funds for the Central Universities under Grant Number 2020MS001.
文摘Collaborative vehicular networks is a key enabler to meet the stringent ultra-reliable and lowlatency communications(URLLC)requirements.A user vehicle(UV)dynamically optimizes task offloading by exploiting its collaborations with edge servers and vehicular fog servers(VFSs).However,the optimization of task offloading in highly dynamic collaborative vehicular networks faces several challenges such as URLLC guaranteeing,incomplete information,and dimensionality curse.In this paper,we first characterize URLLC in terms of queuing delay bound violation and high-order statistics of excess backlogs.Then,a Deep Reinforcement lEarning-based URLLCAware task offloading algorithM named DREAM is proposed to maximize the throughput of the UVs while satisfying the URLLC constraints in a besteffort way.Compared with existing task offloading algorithms,DREAM achieves superior performance in throughput,queuing delay,and URLLC.
基金supported by the Universiti Tunku Abdul Rahman (UTAR) Malaysia under UTARRF (IPSR/RMC/UTARRF/2021-C1/T05)
文摘The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.
文摘针对传统深度Q学习网络(deep Q-learning network,DQN)在具有动态障碍物的路径规划下,移动机器人在探索时频繁碰撞难以移动至目标点的问题,通过在探索策略和经验回放机制上进行改进,提出一种改进的DQN算法。在探索策略上,利用快速搜索随机树(rapidly-exploring random tree,RRT)算法自动生成静态先验知识来指导动作选取,替代ε-贪婪策略的随机动作,提高智能体到达目标的成功率;在经验利用上,使用K-means算法设计一种聚类经验回放机制,根据动态障碍物的位置信息进行聚类分簇,着重采样与当前智能体状态相似的经验进行回放,使智能体更有效地避免碰撞动态障碍物。二维栅格化环境下的仿真实验表明,在动态环境下,该算法可以避开静态和动态障碍物,成功移动至目标点,验证了该算法在应对动态避障路径规划的可行性。
文摘容器云系统的资源调度策略对资源利用率和集群性能起着重要作用。现有的容器集群调度没有充分考虑节点内部和节点之间的资源占用情况,容易出现容器资源瓶颈,造成资源利用率低和服务可靠性差的问题。为了均衡容器集群的工作负载,减少容器资源瓶颈的出现,提出了一种基于DQN(Deep Q-learning Network)的容器集群调度优化算法CS-DQN(Container Scheduling Optimization Strategy Based on DQN)。首先提出一种面向负载均衡的容器集群资源利用率优化模型。然后利用深度强化学习方法,设计一种基于DQN的容器集群调度算法,定义相关的状态空间、动作空间和奖励函数。通过引入改进的DQN算法,基于自学习方法生成满足优化目标的容器动态调度策略。实验结果表明,该调度策略扩大了在调度中可部署容器的规模,在不同的工作负载中实现了较好的负载均衡,提高了资源利用率,更好地保证了服务可靠性。
基金supported by the National Natural Science Foundation of China(62031017,61971221).
文摘It is essential to maximize capacity while satisfying the transmission time delay of unmanned aerial vehicle(UAV)swarm communication system.In order to address this challenge,a dynamic decentralized optimization mechanism is presented for the realization of joint spectrum and power(JSAP)resource allocation based on deep Q-learning networks(DQNs).Each UAV to UAV(U2U)link is regarded as an agent that is capable of identifying the optimal spectrum and power to communicate with one another.The convolutional neural network,target network,and experience replay are adopted while training.The findings of the simulation indicate that the proposed method has the potential to improve both communication capacity and probability of successful data transmission when compared with random centralized assignment and multichannel access methods.
基金Supported by the National Ministries and Research Funds(3020020221111)
文摘A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.