To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q...To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is associated.In the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until convergence.In the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user distribution.Simulations demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional schemes.Meanwhile,performance metrics,such as capacity and fairness,can be guaranteed.展开更多
Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environmen...Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.展开更多
Collaborative vehicular networks is a key enabler to meet the stringent ultra-reliable and lowlatency communications(URLLC)requirements.A user vehicle(UV)dynamically optimizes task offloading by exploiting its collabo...Collaborative vehicular networks is a key enabler to meet the stringent ultra-reliable and lowlatency communications(URLLC)requirements.A user vehicle(UV)dynamically optimizes task offloading by exploiting its collaborations with edge servers and vehicular fog servers(VFSs).However,the optimization of task offloading in highly dynamic collaborative vehicular networks faces several challenges such as URLLC guaranteeing,incomplete information,and dimensionality curse.In this paper,we first characterize URLLC in terms of queuing delay bound violation and high-order statistics of excess backlogs.Then,a Deep Reinforcement lEarning-based URLLCAware task offloading algorithM named DREAM is proposed to maximize the throughput of the UVs while satisfying the URLLC constraints in a besteffort way.Compared with existing task offloading algorithms,DREAM achieves superior performance in throughput,queuing delay,and URLLC.展开更多
针对自动化立体仓库出库作业过程中剩余货物退库问题,以堆垛机作业总能耗最小化为目标,以退库货位分配为决策变量,建立了自动化立体仓库退库货位优化模型,提出了基于深度强化学习的自动化立体仓库退库货位优化框架。在该框架内,以立体...针对自动化立体仓库出库作业过程中剩余货物退库问题,以堆垛机作业总能耗最小化为目标,以退库货位分配为决策变量,建立了自动化立体仓库退库货位优化模型,提出了基于深度强化学习的自动化立体仓库退库货位优化框架。在该框架内,以立体仓库实时存储信息和出库作业信息构建多维状态,以退库货位选择构建动作,建立自动化立体仓库退库货位优化的马尔科夫决策过程模型;将立体仓库多维状态特征输入双层决斗网络,采用决斗双重深度Q网络(dueling double deep Q-network,D3QN)算法训练网络模型并预测退库动作目标价值,以确定智能体的最优行为策略。实验结果表明D3QN算法在求解大规模退库货位优化问题上具有较好的稳定性。展开更多
如何在客户规定的时间内合理安排车辆运输路线,一直是物流领域亟待解决的问题。基于此,文章提出使用基于软更新策略的决斗双重深度Q网络(Dueling Double Deep Q-network,D3QN),设计动作空间、状态空间与奖励函数,对带时间窗的绿色车辆...如何在客户规定的时间内合理安排车辆运输路线,一直是物流领域亟待解决的问题。基于此,文章提出使用基于软更新策略的决斗双重深度Q网络(Dueling Double Deep Q-network,D3QN),设计动作空间、状态空间与奖励函数,对带时间窗的绿色车辆路径问题进行建模与求解。选择了小、中、大规模的总计18个算例,将三种算法的实验结果在平均奖励、平均调度车辆数、平均里程和运算时间四个维度进行比较。实验结果表明:在大多数算例中,与Double DQN和Dueling DQN相比,D3QN能在可接受的增加时间范围内,获得更高的奖励函数,调度更少的车辆数,运输更短的里程,实现绿色调度的目标。展开更多
Directly grasping the tightly stacked objects may cause collisions and result in failures,degenerating the functionality of robotic arms.Inspired by the observation that first pushing objects to a state of mutual sepa...Directly grasping the tightly stacked objects may cause collisions and result in failures,degenerating the functionality of robotic arms.Inspired by the observation that first pushing objects to a state of mutual separation and then grasping them individually can effectively increase the success rate,we devise a novel deep Q-learning framework to achieve collaborative pushing and grasping.Specifically,an efficient non-maximum suppression policy(PolicyNMS)is proposed to dynamically evaluate pushing and grasping actions by enforcing a suppression constraint on unreasonable actions.Moreover,a novel data-driven pushing reward network called PR-Net is designed to effectively assess the degree of separation or aggregation between objects.To benchmark the proposed method,we establish a dataset containing common household items dataset(CHID)in both simulation and real scenarios.Although trained using simulation data only,experiment results validate that our method generalizes well to real scenarios and achieves a 97%grasp success rate at a fast speed for object separation in the real-world environment.展开更多
This paper focuses on the problem of active object detection(AOD).AOD is important for service robots to complete tasks in the family environment,and leads robots to approach the target ob ject by taking appropriate m...This paper focuses on the problem of active object detection(AOD).AOD is important for service robots to complete tasks in the family environment,and leads robots to approach the target ob ject by taking appropriate moving actions.Most of the current AOD methods are based on reinforcement learning with low training efficiency and testing accuracy.Therefore,an AOD model based on a deep Q-learning network(DQN)with a novel training algorithm is proposed in this paper.The DQN model is designed to fit the Q-values of various actions,and includes state space,feature extraction,and a multilayer perceptron.In contrast to existing research,a novel training algorithm based on memory is designed for the proposed DQN model to improve training efficiency and testing accuracy.In addition,a method of generating the end state is presented to judge when to stop the AOD task during the training process.Sufficient comparison experiments and ablation studies are performed based on an AOD dataset,proving that the presented method has better performance than the comparable methods and that the proposed training algorithm is more effective than the raw training algorithm.展开更多
Unmanned aerial vehicles(UAVs) are increasingly considered in safe autonomous navigation systems to explore unknown environments where UAVs are equipped with multiple sensors to perceive the surroundings. However, how...Unmanned aerial vehicles(UAVs) are increasingly considered in safe autonomous navigation systems to explore unknown environments where UAVs are equipped with multiple sensors to perceive the surroundings. However, how to achieve UAVenabled data dissemination and also ensure safe navigation synchronously is a new challenge. In this paper, our goal is minimizing the whole weighted sum of the UAV’s task completion time while satisfying the data transmission task requirement and the UAV’s feasible flight region constraints. However, it is unable to be solved via standard optimization methods mainly on account of lacking a tractable and accurate system model in practice. To overcome this tough issue,we propose a new solution approach by utilizing the most advanced dueling double deep Q network(dueling DDQN) with multi-step learning. Specifically, to improve the algorithm, the extra labels are added to the primitive states. Simulation results indicate the validity and performance superiority of the proposed algorithm under different data thresholds compared with two other benchmarks.展开更多
基金This work was supported by the Fundamental Research Funds for the Central Universities of China under grant no.PA2019GDQT0012by National Natural Science Foundation of China(Grant No.61971176)by the Applied Basic Research Program ofWuhan City,China,under grand 2017010201010117.
文摘To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is associated.In the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until convergence.In the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user distribution.Simulations demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional schemes.Meanwhile,performance metrics,such as capacity and fairness,can be guaranteed.
基金Authors would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work under Project No.g01/n04.
文摘Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively.
基金This work was partially supported by the Open Funding of the Shaanxi Key Laboratory of Intelligent Processing for Big Energy Data under Grant Number IPBED3supported by the National Natural Science Foundation of China(NSFC)under Grant Number 61971189supported by the Fundamental Research Funds for the Central Universities under Grant Number 2020MS001.
文摘Collaborative vehicular networks is a key enabler to meet the stringent ultra-reliable and lowlatency communications(URLLC)requirements.A user vehicle(UV)dynamically optimizes task offloading by exploiting its collaborations with edge servers and vehicular fog servers(VFSs).However,the optimization of task offloading in highly dynamic collaborative vehicular networks faces several challenges such as URLLC guaranteeing,incomplete information,and dimensionality curse.In this paper,we first characterize URLLC in terms of queuing delay bound violation and high-order statistics of excess backlogs.Then,a Deep Reinforcement lEarning-based URLLCAware task offloading algorithM named DREAM is proposed to maximize the throughput of the UVs while satisfying the URLLC constraints in a besteffort way.Compared with existing task offloading algorithms,DREAM achieves superior performance in throughput,queuing delay,and URLLC.
文摘随着大量直流电源和负荷的接入,交直流混合的配电网技术已成为未来配电网的发展趋势.然而,源荷不确定性及可调度设备的类型多样化给配电网调度带来了巨大的挑战.本文提出了基于分支决斗深度强化网络(branching dueling Q-network,BDQ)和软演员-评论家(soft actor critic,SAC)双智能体深度强化学习的交直流配电网调度方法.该方法首先将经济调度问题与两智能体的动作、奖励、状态相结合,建立经济调度的马尔可夫决策过程,并分别基于BDQ和SAC方法设置两个智能体,其中,BDQ智能体用于控制配电网中离散动作设备,SAC智能体用于控制连续动作设备.然后,通过集中训练分散执行的方式,两智能体与环境进行交互,进行离线训练.最后,固定智能体的参数,进行在线调度.该方法的优势在于采用双智能体能够同时控制离散动作设备电容器组、载调压变压器和连续动作设备变流器、储能,同时通过对双智能体的集中训练,可以自适应源荷的不确定性.改进的IEEE33节点交直流配电网算例测试验证了所提方法的有效性.
文摘针对自动化立体仓库出库作业过程中剩余货物退库问题,以堆垛机作业总能耗最小化为目标,以退库货位分配为决策变量,建立了自动化立体仓库退库货位优化模型,提出了基于深度强化学习的自动化立体仓库退库货位优化框架。在该框架内,以立体仓库实时存储信息和出库作业信息构建多维状态,以退库货位选择构建动作,建立自动化立体仓库退库货位优化的马尔科夫决策过程模型;将立体仓库多维状态特征输入双层决斗网络,采用决斗双重深度Q网络(dueling double deep Q-network,D3QN)算法训练网络模型并预测退库动作目标价值,以确定智能体的最优行为策略。实验结果表明D3QN算法在求解大规模退库货位优化问题上具有较好的稳定性。
基金This work was supported by the National Natural Science Foundation of China(61873077,61806062)Zhejiang Provincial Major Research and Development Project of China(2020C01110)Zhejiang Provincial Key Laboratory of Equipment Electronics.
文摘Directly grasping the tightly stacked objects may cause collisions and result in failures,degenerating the functionality of robotic arms.Inspired by the observation that first pushing objects to a state of mutual separation and then grasping them individually can effectively increase the success rate,we devise a novel deep Q-learning framework to achieve collaborative pushing and grasping.Specifically,an efficient non-maximum suppression policy(PolicyNMS)is proposed to dynamically evaluate pushing and grasping actions by enforcing a suppression constraint on unreasonable actions.Moreover,a novel data-driven pushing reward network called PR-Net is designed to effectively assess the degree of separation or aggregation between objects.To benchmark the proposed method,we establish a dataset containing common household items dataset(CHID)in both simulation and real scenarios.Although trained using simulation data only,experiment results validate that our method generalizes well to real scenarios and achieves a 97%grasp success rate at a fast speed for object separation in the real-world environment.
基金supported by the National Natural Science Foundation of China(Nos.U1813215 and 62273203)。
文摘This paper focuses on the problem of active object detection(AOD).AOD is important for service robots to complete tasks in the family environment,and leads robots to approach the target ob ject by taking appropriate moving actions.Most of the current AOD methods are based on reinforcement learning with low training efficiency and testing accuracy.Therefore,an AOD model based on a deep Q-learning network(DQN)with a novel training algorithm is proposed in this paper.The DQN model is designed to fit the Q-values of various actions,and includes state space,feature extraction,and a multilayer perceptron.In contrast to existing research,a novel training algorithm based on memory is designed for the proposed DQN model to improve training efficiency and testing accuracy.In addition,a method of generating the end state is presented to judge when to stop the AOD task during the training process.Sufficient comparison experiments and ablation studies are performed based on an AOD dataset,proving that the presented method has better performance than the comparable methods and that the proposed training algorithm is more effective than the raw training algorithm.
基金supported by the National Natural Science Foundation of China (No. 61931011)。
文摘Unmanned aerial vehicles(UAVs) are increasingly considered in safe autonomous navigation systems to explore unknown environments where UAVs are equipped with multiple sensors to perceive the surroundings. However, how to achieve UAVenabled data dissemination and also ensure safe navigation synchronously is a new challenge. In this paper, our goal is minimizing the whole weighted sum of the UAV’s task completion time while satisfying the data transmission task requirement and the UAV’s feasible flight region constraints. However, it is unable to be solved via standard optimization methods mainly on account of lacking a tractable and accurate system model in practice. To overcome this tough issue,we propose a new solution approach by utilizing the most advanced dueling double deep Q network(dueling DDQN) with multi-step learning. Specifically, to improve the algorithm, the extra labels are added to the primitive states. Simulation results indicate the validity and performance superiority of the proposed algorithm under different data thresholds compared with two other benchmarks.