With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms ...With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models.展开更多
The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a promi...The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.展开更多
A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture ...A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.展开更多
Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging ...Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging processes.However,in large-scale heterogeneous Internet of Things(IoT)cellular networks,massive multi-dimensional model update iterations and resource-constrained computation are challenging aspects to be tackled significantly.This paper introduces the system model of converging softwaredefined networking(SDN)and network functions virtualization(NFV)to enable device/resource abstractions and provide NFV-enabled edge FL(eFL)aggregation servers for advancing automation and controllability.Multi-agent deep Q-networks(MADQNs)target to enforce a self-learning softwarization,optimize resource allocation policies,and advocate computation offloading decisions.With gathered network conditions and resource states,the proposed agent aims to explore various actions for estimating expected longterm rewards in a particular state observation.In exploration phase,optimal actions for joint resource allocation and offloading decisions in different possible states are obtained by maximum Q-value selections.Action-based virtual network functions(VNF)forwarding graph(VNFFG)is orchestrated to map VNFs towards eFL aggregation server with sufficient communication and computation resources in NFV infrastructure(NFVI).The proposed scheme indicates deficient allocation actions,modifies the VNF backup instances,and reallocates the virtual resource for exploitation phase.Deep neural network(DNN)is used as a value function approximator,and epsilongreedy algorithm balances exploration and exploitation.The scheme primarily considers the criticalities of FL model services and congestion states to optimize long-term policy.Simulation results presented the outperformance of the proposed scheme over reference schemes in terms of Quality of Service(QoS)performance metrics,including packet drop ratio,packet drop counts,packet delivery ratio,delay,and throughput.展开更多
With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in ...With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games.展开更多
In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly ...In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design.展开更多
The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factor...The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factors contribute to a tendency for the solution to converge slowly,and in some cases,diverge altogether.In addressing this issue,this paper introduces a novel approach utilizing a double dueling deep Q-network(D3QN),tailored for dynamic multi-agent environments.A novel reward function based on multi-agent positional constraints is designed,and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents.Moreover,the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum.To match radar and image sensors,a convolutional neural network-long short-term memory(CNN-LSTM)architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN.The algorithm’s efficacy and reliability are validated in a simulated environment,utilizing robot operating system and Gazebo.The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios.In terms of the average success rate and accuracy,the proposed method is superior to other deep learning algorithms,and the convergence speed is also improved.展开更多
Recent years have seen the rapid development of autonomous driving systems,which are typically designed in a hierarchical architecture or an end-to-end architecture.The hierarchical architecture is always complicated ...Recent years have seen the rapid development of autonomous driving systems,which are typically designed in a hierarchical architecture or an end-to-end architecture.The hierarchical architecture is always complicated and hard to design,while the end-to-end architecture is more promising due to its simple structure.This paper puts forward an end-to-end autonomous driving method through a deep reinforcement learning algorithm Dueling Double Deep Q-Network,making it possible for the vehicle to learn end-to-end driving by itself.This paper firstly proposes an architecture for the end-to-end lane-keeping task.Unlike the traditional image-only state space,the presented state space is composed of both camera images and vehicle motion information.Then corresponding dueling neural network structure is introduced,which reduces the variance and improves sampling efficiency.Thirdly,the proposed method is applied to The Open Racing Car Simulator(TORCS)to demonstrate its great performance,where it surpasses human drivers.Finally,the saliency map of the neural network is visualized,which indicates the trained network drives by observing the lane lines.A video for the presented work is available online,https://youtu.be/76ciJ mIHMD8 or https://v.youku.com/v_show/id_XNDM4 ODc0M TM4NA==.html.展开更多
In this paper,an unmanned aerial vehicle(UAV)-aided wireless emergence communication system is studied,where a UAV is deployed to support ground user equipments(UEs)for emergence communications.We aim to maximize the ...In this paper,an unmanned aerial vehicle(UAV)-aided wireless emergence communication system is studied,where a UAV is deployed to support ground user equipments(UEs)for emergence communications.We aim to maximize the number of the UEs served,the fairness,and the overall uplink data rate via optimizing the trajectory of UAV and the transmission power of UEs.We propose a deep Q-network(DQN)based algorithm,which involves the well-known deep neural network(DNN)and Q-learning,to solve the UAV trajectory prob-lem.Then,based on the optimized UAV trajectory,we further propose a successive convex approximation(SCA)based algorithm to tackle the power control problem for each UE.Numerical simulations demonstrate that the proposed DQN based algorithm can achieve considerable performance gain over the existing benchmark algorithms in terms of fairness,the number of UEs served and overall uplink data rate via optimizing UAV’s trajectory and power optimization.展开更多
In dense traffic unmanned aerial vehicle(UAV)ad-hoc networks,traffic congestion can cause increased delay and packet loss,which limit the performance of the networks;therefore,a traffic balancing strategy is required ...In dense traffic unmanned aerial vehicle(UAV)ad-hoc networks,traffic congestion can cause increased delay and packet loss,which limit the performance of the networks;therefore,a traffic balancing strategy is required to control the traffic.In this study,we propose TQNGPSR,a traffic-aware Q-network enhanced geographic routing protocol based on greedy perimeter stateless routing(GPSR),for UAV ad-hoc networks.The protocol enforces a traffic balancing strategy using the congestion information of neighbors,and evaluates the quality of a wireless link by the Q-network algorithm,which is a reinforcement learning algorithm.Based on the evaluation of each wireless link,the protocol makes routing decisions in multiple available choices to reduce delay and decrease packet loss.We simulate the performance of TQNGPSR and compare it with AODV,OLSR,GPSR,and QNGPSR.Simulation results show that TQNGPSR obtains higher packet delivery ratios and lower end-to-end delays than GPSR and QNGPSR.In high node density scenarios,it also outperforms AODV and OLSR in terms of the packet delivery ratio,end-to-end delay,and throughput.展开更多
To optimize machine allocation and task dispatching in smart manufacturing factories, this paper proposes a manufacturing resource scheduling framework based on reinforcement learning(RL). The framework formulates the...To optimize machine allocation and task dispatching in smart manufacturing factories, this paper proposes a manufacturing resource scheduling framework based on reinforcement learning(RL). The framework formulates the entire scheduling process as a multi-stage sequential decision problem, and further obtains the scheduling order by the combination of deep convolutional neural network(CNN) and improved deep Q-network(DQN). Specifically, with respect to the representation of the Markov decision process(MDP), the feature matrix is considered as the state space and a set of heuristic dispatching rules are denoted as the action space. In addition, the deep CNN is employed to approximate the state-action values, and the double dueling deep Qnetwork with prioritized experience replay and noisy network(D3QPN2) is adopted to determine the appropriate action according to the current state. In the experiments, compared with the traditional heuristic method, the proposed method is able to learn high-quality scheduling policy and achieve shorter makespan on the standard public datasets.展开更多
High penetration of distributed renewable energy sources and electric vehicles(EVs)makes future active distribution network(ADN)highly variable.These characteristics put great challenges to traditional voltage control...High penetration of distributed renewable energy sources and electric vehicles(EVs)makes future active distribution network(ADN)highly variable.These characteristics put great challenges to traditional voltage control methods.Voltage control based on the deep Q-network(DQN)algorithm offers a potential solution to this problem because it possesses humanlevel control performance.However,the traditional DQN methods may produce overestimation of action reward values,resulting in degradation of obtained solutions.In this paper,an intelligent voltage control method based on averaged weighted double deep Q-network(AWDDQN)algorithm is proposed to overcome the shortcomings of overestimation of action reward values in DQN algorithm and underestimation of action reward values in double deep Q-network(DDQN)algorithm.Using the proposed method,the voltage control objective is incorporated into the designed action reward values and normalized to form a Markov decision process(MDP)model which is solved by the AWDDQN algorithm.The designed AWDDQN-based intelligent voltage control agent is trained offline and used as online intelligent dynamic voltage regulator for the ADN.The proposed voltage control method is validated using the IEEE 33-bus and 123-bus systems containing renewable energy sources and EVs,and compared with the DQN and DDQN algorithms based methods,and traditional mixed-integer nonlinear program based methods.The simulation results show that the proposed method has better convergence and less voltage volatility than the other ones.展开更多
基于现代会议管理的需求,设计和实现一种C/S和B/S混合部署的会议管理系统。包括会议管理服务中心和若干个会议现场,会议管理服务中心包括数据服务器、应用服务器、Web服务器、通信网关和出口路由器;会议现场包括若干个便携式电脑、RFID(...基于现代会议管理的需求,设计和实现一种C/S和B/S混合部署的会议管理系统。包括会议管理服务中心和若干个会议现场,会议管理服务中心包括数据服务器、应用服务器、Web服务器、通信网关和出口路由器;会议现场包括若干个便携式电脑、RFID(Radio Frequency Identification)读卡器、二维码阅读器、信息显示发布设备、现场WLAN设备及用户终端。使用RIA(Rich Internet Application)技术优化了B/S界面,应用RFID对会议过程中的细节进行监控,借助SAAS(Software as a Service)模式实现会议管理按需配置和快速部署。该系统显著提高了会议管理效率。展开更多
目的为了解决车载边缘计算中用户服务质量低以及边缘节点资源不足的问题,方法结合车载边缘计算和停车边缘计算技术,提出“端-多边-云”协作计算卸载模型,并设计基于DRL的协作计算卸载与资源分配算法(cooperative computation offloading...目的为了解决车载边缘计算中用户服务质量低以及边缘节点资源不足的问题,方法结合车载边缘计算和停车边缘计算技术,提出“端-多边-云”协作计算卸载模型,并设计基于DRL的协作计算卸载与资源分配算法(cooperative computation offloading and resource allocation algorithm based on DRL,DRL-CCORA)。首先,将路边停放车辆的算力构建成停车边缘服务器(parking edge server,PES),联合边缘节点为车辆任务提供计算服务,减轻边缘节点的负载;其次,将计算卸载与资源分配问题转化为马尔可夫决策过程模型,综合时延、能耗和服务质量构建奖励函数,并根据任务需要的计算资源、任务的最大容忍时延以及车辆到PES的距离对计算任务进行预分类处理,缩减问题的规模;最后,利用双深度Q网络(double deep q network,DDQN)算法获得计算卸载和资源分配的最优策略。结果结果表明,相较于对比算法,所提算法的用户总服务质量提高了6.25%,任务的完成率提高了10.26%,任务计算的时延和能耗分别降低了18.8%、5.26%。结论所提算法优化了边缘节点的负载,降低了任务完成的时延和能耗,提高了用户的服务质量。展开更多
基金supported in part by the Pioneer and Leading Goose R&D Program of Zhejiang Province under Grant 2022C01083 (Dr.Yu Li,https://zjnsf.kjt.zj.gov.cn/)Pioneer and Leading Goose R&D Program of Zhejiang Province under Grant 2023C01217 (Dr.Yu Li,https://zjnsf.kjt.zj.gov.cn/).
文摘With the rapid development ofmobile Internet,spatial crowdsourcing has becomemore andmore popular.Spatial crowdsourcing consists of many different types of applications,such as spatial crowd-sensing services.In terms of spatial crowd-sensing,it collects and analyzes traffic sensing data from clients like vehicles and traffic lights to construct intelligent traffic prediction models.Besides collecting sensing data,spatial crowdsourcing also includes spatial delivery services like DiDi and Uber.Appropriate task assignment and worker selection dominate the service quality for spatial crowdsourcing applications.Previous research conducted task assignments via traditional matching approaches or using simple network models.However,advanced mining methods are lacking to explore the relationship between workers,task publishers,and the spatio-temporal attributes in tasks.Therefore,in this paper,we propose a Deep Double Dueling Spatial-temporal Q Network(D3SQN)to adaptively learn the spatialtemporal relationship between task,task publishers,and workers in a dynamic environment to achieve optimal allocation.Specifically,D3SQNis revised through reinforcement learning by adding a spatial-temporal transformer that can estimate the expected state values and action advantages so as to improve the accuracy of task assignments.Extensive experiments are conducted over real data collected fromDiDi and ELM,and the simulation results verify the effectiveness of our proposed models.
基金supported by the Universiti Tunku Abdul Rahman (UTAR) Malaysia under UTARRF (IPSR/RMC/UTARRF/2021-C1/T05)
文摘The recent surge of mobile subscribers and user data traffic has accelerated the telecommunication sector towards the adoption of the fifth-generation (5G) mobile networks. Cloud radio access network (CRAN) is a prominent framework in the 5G mobile network to meet the above requirements by deploying low-cost and intelligent multiple distributed antennas known as remote radio heads (RRHs). However, achieving the optimal resource allocation (RA) in CRAN using the traditional approach is still challenging due to the complex structure. In this paper, we introduce the convolutional neural network-based deep Q-network (CNN-DQN) to balance the energy consumption and guarantee the user quality of service (QoS) demand in downlink CRAN. We first formulate the Markov decision process (MDP) for energy efficiency (EE) and build up a 3-layer CNN to capture the environment feature as an input state space. We then use DQN to turn on/off the RRHs dynamically based on the user QoS demand and energy consumption in the CRAN. Finally, we solve the RA problem based on the user constraint and transmit power to guarantee the user QoS demand and maximize the EE with a minimum number of active RRHs. In the end, we conduct the simulation to compare our proposed scheme with nature DQN and the traditional approach.
基金Supported by the National Ministries and Research Funds(3020020221111)
文摘A gait control method for a biped robot based on the deep Q-network (DQN) algorithm is proposed to enhance the stability of walking on uneven ground. This control strategy is an intelligent learning method of posture adjustment. A robot is taken as an agent and trained to walk steadily on an uneven surface with obstacles, using a simple reward function based on forward progress. The reward-punishment (RP) mechanism of the DQN algorithm is established after obtaining the offline gait which was generated in advance foot trajectory planning. Instead of implementing a complex dynamic model, the proposed method enables the biped robot to learn to adjust its posture on the uneven ground and ensures walking stability. The performance and effectiveness of the proposed algorithm was validated in the V-REP simulation environment. The results demonstrate that the biped robot's lateral tile angle is less than 3° after implementing the proposed method and the walking stability is obviously improved.
基金This work was funded by BK21 FOUR(Fostering Outstanding Universities for Research)(No.5199990914048)this research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2020R1I1A3066543)In addition,this work was supported by the Soonchunhyang University Research Fund.
文摘Federated learning(FL)activates distributed on-device computation techniques to model a better algorithm performance with the interaction of local model updates and global model distributions in aggregation averaging processes.However,in large-scale heterogeneous Internet of Things(IoT)cellular networks,massive multi-dimensional model update iterations and resource-constrained computation are challenging aspects to be tackled significantly.This paper introduces the system model of converging softwaredefined networking(SDN)and network functions virtualization(NFV)to enable device/resource abstractions and provide NFV-enabled edge FL(eFL)aggregation servers for advancing automation and controllability.Multi-agent deep Q-networks(MADQNs)target to enforce a self-learning softwarization,optimize resource allocation policies,and advocate computation offloading decisions.With gathered network conditions and resource states,the proposed agent aims to explore various actions for estimating expected longterm rewards in a particular state observation.In exploration phase,optimal actions for joint resource allocation and offloading decisions in different possible states are obtained by maximum Q-value selections.Action-based virtual network functions(VNF)forwarding graph(VNFFG)is orchestrated to map VNFs towards eFL aggregation server with sufficient communication and computation resources in NFV infrastructure(NFVI).The proposed scheme indicates deficient allocation actions,modifies the VNF backup instances,and reallocates the virtual resource for exploitation phase.Deep neural network(DNN)is used as a value function approximator,and epsilongreedy algorithm balances exploration and exploitation.The scheme primarily considers the criticalities of FL model services and congestion states to optimize long-term policy.Simulation results presented the outperformance of the proposed scheme over reference schemes in terms of Quality of Service(QoS)performance metrics,including packet drop ratio,packet drop counts,packet delivery ratio,delay,and throughput.
文摘With the advent of Reinforcement Learning(RL)and its continuous progress,state-of-the-art RL systems have come up for many challenging and real-world tasks.Given the scope of this area,various techniques are found in the literature.One such notable technique,Multiple Deep Q-Network(DQN)based RL systems use multiple DQN-based-entities,which learn together and communicate with each other.The learning has to be distributed wisely among all entities in such a scheme and the inter-entity communication protocol has to be carefully designed.As more complex DQNs come to the fore,the overall complexity of these multi-entity systems has increased many folds leading to issues like difficulty in training,need for high resources,more training time,and difficulty in fine-tuning leading to performance issues.Taking a cue from the parallel processing found in the nature and its efficacy,we propose a lightweight ensemble based approach for solving the core RL tasks.It uses multiple binary action DQNs having shared state and reward.The benefits of the proposed approach are overall simplicity,faster convergence and better performance compared to conventional DQN based approaches.The approach can potentially be extended to any type of DQN by forming its ensemble.Conducting extensive experimentation,promising results are obtained using the proposed ensemble approach on OpenAI Gym tasks,and Atari 2600 games as compared to recent techniques.The proposed approach gives a stateof-the-art score of 500 on the Cartpole-v1 task,259.2 on the LunarLander-v2 task,and state-of-the-art results on four out of five Atari 2600 games.
文摘In a rechargeable wireless sensor network,utilizing the unmanned aerial vehicle(UAV)as a mobile base station(BS)to charge sensors and collect data effectively prolongs the network’s lifetime.In this paper,we jointly optimize the UAV’s flight trajectory and the sensor selection and operation modes to maximize the average data traffic of all sensors within a wireless sensor network(WSN)during finite UAV’s flight time,while ensuring the energy required for each sensor by wireless power transfer(WPT).We consider a practical scenario,where the UAV has no prior knowledge of sensor locations.The UAV performs autonomous navigation based on the status information obtained within the coverage area,which is modeled as a Markov decision process(MDP).The deep Q-network(DQN)is employed to execute the navigation based on the UAV position,the battery level state,channel conditions and current data traffic of sensors within the UAV’s coverage area.Our simulation results demonstrate that the DQN algorithm significantly improves the network performance in terms of the average data traffic and trajectory design.
基金National Natural Science Foundation of China(Nos.61673262 and 50779033)National GF Basic Research Program(No.JCKY2021110B134)Fundamental Research Funds for the Central Universities。
文摘The multi-agent path planning problem presents significant challenges in dynamic environments,primarily due to the ever-changing positions of obstacles and the complex interactions between agents’actions.These factors contribute to a tendency for the solution to converge slowly,and in some cases,diverge altogether.In addressing this issue,this paper introduces a novel approach utilizing a double dueling deep Q-network(D3QN),tailored for dynamic multi-agent environments.A novel reward function based on multi-agent positional constraints is designed,and a training strategy based on incremental learning is performed to achieve collaborative path planning of multiple agents.Moreover,the greedy and Boltzmann probability selection policy is introduced for action selection and avoiding convergence to local extremum.To match radar and image sensors,a convolutional neural network-long short-term memory(CNN-LSTM)architecture is constructed to extract the feature of multi-source measurement as the input of the D3QN.The algorithm’s efficacy and reliability are validated in a simulated environment,utilizing robot operating system and Gazebo.The simulation results show that the proposed algorithm provides a real-time solution for path planning tasks in dynamic scenarios.In terms of the average success rate and accuracy,the proposed method is superior to other deep learning algorithms,and the convergence speed is also improved.
基金This work is supported by the National Key Research and Development Project of China under Grant 2018YFB1600600Beijing Natural Science Foundation with JQ18010.The authors should also thank the support from Tsinghua University-Didi Joint Research Center for Future Mobility.
文摘Recent years have seen the rapid development of autonomous driving systems,which are typically designed in a hierarchical architecture or an end-to-end architecture.The hierarchical architecture is always complicated and hard to design,while the end-to-end architecture is more promising due to its simple structure.This paper puts forward an end-to-end autonomous driving method through a deep reinforcement learning algorithm Dueling Double Deep Q-Network,making it possible for the vehicle to learn end-to-end driving by itself.This paper firstly proposes an architecture for the end-to-end lane-keeping task.Unlike the traditional image-only state space,the presented state space is composed of both camera images and vehicle motion information.Then corresponding dueling neural network structure is introduced,which reduces the variance and improves sampling efficiency.Thirdly,the proposed method is applied to The Open Racing Car Simulator(TORCS)to demonstrate its great performance,where it surpasses human drivers.Finally,the saliency map of the neural network is visualized,which indicates the trained network drives by observing the lane lines.A video for the presented work is available online,https://youtu.be/76ciJ mIHMD8 or https://v.youku.com/v_show/id_XNDM4 ODc0M TM4NA==.html.
基金The associate editor coordinating the review of this paper and approving it for publication was J.Zhang.
文摘In this paper,an unmanned aerial vehicle(UAV)-aided wireless emergence communication system is studied,where a UAV is deployed to support ground user equipments(UEs)for emergence communications.We aim to maximize the number of the UEs served,the fairness,and the overall uplink data rate via optimizing the trajectory of UAV and the transmission power of UEs.We propose a deep Q-network(DQN)based algorithm,which involves the well-known deep neural network(DNN)and Q-learning,to solve the UAV trajectory prob-lem.Then,based on the optimized UAV trajectory,we further propose a successive convex approximation(SCA)based algorithm to tackle the power control problem for each UE.Numerical simulations demonstrate that the proposed DQN based algorithm can achieve considerable performance gain over the existing benchmark algorithms in terms of fairness,the number of UEs served and overall uplink data rate via optimizing UAV’s trajectory and power optimization.
基金Project supported by the National Natural Science Foundation of China(No.61501399)the National Key R&D Program of China(No.2018AAA0102302)。
文摘In dense traffic unmanned aerial vehicle(UAV)ad-hoc networks,traffic congestion can cause increased delay and packet loss,which limit the performance of the networks;therefore,a traffic balancing strategy is required to control the traffic.In this study,we propose TQNGPSR,a traffic-aware Q-network enhanced geographic routing protocol based on greedy perimeter stateless routing(GPSR),for UAV ad-hoc networks.The protocol enforces a traffic balancing strategy using the congestion information of neighbors,and evaluates the quality of a wireless link by the Q-network algorithm,which is a reinforcement learning algorithm.Based on the evaluation of each wireless link,the protocol makes routing decisions in multiple available choices to reduce delay and decrease packet loss.We simulate the performance of TQNGPSR and compare it with AODV,OLSR,GPSR,and QNGPSR.Simulation results show that TQNGPSR obtains higher packet delivery ratios and lower end-to-end delays than GPSR and QNGPSR.In high node density scenarios,it also outperforms AODV and OLSR in terms of the packet delivery ratio,end-to-end delay,and throughput.
基金Supported by the National Key Research and Development Plan(2019YFB1706401)。
文摘To optimize machine allocation and task dispatching in smart manufacturing factories, this paper proposes a manufacturing resource scheduling framework based on reinforcement learning(RL). The framework formulates the entire scheduling process as a multi-stage sequential decision problem, and further obtains the scheduling order by the combination of deep convolutional neural network(CNN) and improved deep Q-network(DQN). Specifically, with respect to the representation of the Markov decision process(MDP), the feature matrix is considered as the state space and a set of heuristic dispatching rules are denoted as the action space. In addition, the deep CNN is employed to approximate the state-action values, and the double dueling deep Qnetwork with prioritized experience replay and noisy network(D3QPN2) is adopted to determine the appropriate action according to the current state. In the experiments, compared with the traditional heuristic method, the proposed method is able to learn high-quality scheduling policy and achieve shorter makespan on the standard public datasets.
基金supported in part by the Anhui Province Natural Science Foundation(No.2108085UD02)the National Natural Science Foundation of China(No.51577047)111 Project(No.BP0719039)。
文摘High penetration of distributed renewable energy sources and electric vehicles(EVs)makes future active distribution network(ADN)highly variable.These characteristics put great challenges to traditional voltage control methods.Voltage control based on the deep Q-network(DQN)algorithm offers a potential solution to this problem because it possesses humanlevel control performance.However,the traditional DQN methods may produce overestimation of action reward values,resulting in degradation of obtained solutions.In this paper,an intelligent voltage control method based on averaged weighted double deep Q-network(AWDDQN)algorithm is proposed to overcome the shortcomings of overestimation of action reward values in DQN algorithm and underestimation of action reward values in double deep Q-network(DDQN)algorithm.Using the proposed method,the voltage control objective is incorporated into the designed action reward values and normalized to form a Markov decision process(MDP)model which is solved by the AWDDQN algorithm.The designed AWDDQN-based intelligent voltage control agent is trained offline and used as online intelligent dynamic voltage regulator for the ADN.The proposed voltage control method is validated using the IEEE 33-bus and 123-bus systems containing renewable energy sources and EVs,and compared with the DQN and DDQN algorithms based methods,and traditional mixed-integer nonlinear program based methods.The simulation results show that the proposed method has better convergence and less voltage volatility than the other ones.
文摘基于现代会议管理的需求,设计和实现一种C/S和B/S混合部署的会议管理系统。包括会议管理服务中心和若干个会议现场,会议管理服务中心包括数据服务器、应用服务器、Web服务器、通信网关和出口路由器;会议现场包括若干个便携式电脑、RFID(Radio Frequency Identification)读卡器、二维码阅读器、信息显示发布设备、现场WLAN设备及用户终端。使用RIA(Rich Internet Application)技术优化了B/S界面,应用RFID对会议过程中的细节进行监控,借助SAAS(Software as a Service)模式实现会议管理按需配置和快速部署。该系统显著提高了会议管理效率。
文摘目的为了解决车载边缘计算中用户服务质量低以及边缘节点资源不足的问题,方法结合车载边缘计算和停车边缘计算技术,提出“端-多边-云”协作计算卸载模型,并设计基于DRL的协作计算卸载与资源分配算法(cooperative computation offloading and resource allocation algorithm based on DRL,DRL-CCORA)。首先,将路边停放车辆的算力构建成停车边缘服务器(parking edge server,PES),联合边缘节点为车辆任务提供计算服务,减轻边缘节点的负载;其次,将计算卸载与资源分配问题转化为马尔可夫决策过程模型,综合时延、能耗和服务质量构建奖励函数,并根据任务需要的计算资源、任务的最大容忍时延以及车辆到PES的距离对计算任务进行预分类处理,缩减问题的规模;最后,利用双深度Q网络(double deep q network,DDQN)算法获得计算卸载和资源分配的最优策略。结果结果表明,相较于对比算法,所提算法的用户总服务质量提高了6.25%,任务的完成率提高了10.26%,任务计算的时延和能耗分别降低了18.8%、5.26%。结论所提算法优化了边缘节点的负载,降低了任务完成的时延和能耗,提高了用户的服务质量。