Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a st...Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.展开更多
The main aim of future mobile networks is to provide secure,reliable,intelligent,and seamless connectivity.It also enables mobile network operators to ensure their customer’s a better quality of service(QoS).Nowadays...The main aim of future mobile networks is to provide secure,reliable,intelligent,and seamless connectivity.It also enables mobile network operators to ensure their customer’s a better quality of service(QoS).Nowadays,Unmanned Aerial Vehicles(UAVs)are a significant part of the mobile network due to their continuously growing use in various applications.For better coverage,cost-effective,and seamless service connectivity and provisioning,UAVs have emerged as the best choice for telco operators.UAVs can be used as flying base stations,edge servers,and relay nodes in mobile networks.On the other side,Multi-access EdgeComputing(MEC)technology also emerged in the 5G network to provide a better quality of experience(QoE)to users with different QoS requirements.However,UAVs in a mobile network for coverage enhancement and better QoS face several challenges such as trajectory designing,path planning,optimization,QoS assurance,mobilitymanagement,etc.The efficient and proactive path planning and optimization in a highly dynamic environment containing buildings and obstacles are challenging.So,an automated Artificial Intelligence(AI)enabled QoSaware solution is needed for trajectory planning and optimization.Therefore,this work introduces a well-designed AI and MEC-enabled architecture for a UAVs-assisted future network.It has an efficient Deep Reinforcement Learning(DRL)algorithm for real-time and proactive trajectory planning and optimization.It also fulfills QoS-aware service provisioning.A greedypolicy approach is used to maximize the long-term reward for serving more users withQoS.Simulation results reveal the superiority of the proposed DRL mechanism for energy-efficient and QoS-aware trajectory planning over the existing models.展开更多
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai...The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.展开更多
To address the shortcomings of single-step decision making in the existing deep reinforcement learning based unmanned aerial vehicle(UAV)real-time path planning problem,a real-time UAV path planning algorithm based on...To address the shortcomings of single-step decision making in the existing deep reinforcement learning based unmanned aerial vehicle(UAV)real-time path planning problem,a real-time UAV path planning algorithm based on long shortterm memory(RPP-LSTM)network is proposed,which combines the memory characteristics of recurrent neural network(RNN)and the deep reinforcement learning algorithm.LSTM networks are used in this algorithm as Q-value networks for the deep Q network(DQN)algorithm,which makes the decision of the Q-value network has some memory.Thanks to LSTM network,the Q-value network can use the previous environmental information and action information which effectively avoids the problem of single-step decision considering only the current environment.Besides,the algorithm proposes a hierarchical reward and punishment function for the specific problem of UAV real-time path planning,so that the UAV can more reasonably perform path planning.Simulation verification shows that compared with the traditional feed-forward neural network(FNN)based UAV autonomous path planning algorithm,the RPP-LSTM proposed in this paper can adapt to more complex environments and has significantly improved robustness and accuracy when performing UAV real-time path planning.展开更多
Unmanned aerial vehicles(UAVs) are increasingly considered in safe autonomous navigation systems to explore unknown environments where UAVs are equipped with multiple sensors to perceive the surroundings. However, how...Unmanned aerial vehicles(UAVs) are increasingly considered in safe autonomous navigation systems to explore unknown environments where UAVs are equipped with multiple sensors to perceive the surroundings. However, how to achieve UAVenabled data dissemination and also ensure safe navigation synchronously is a new challenge. In this paper, our goal is minimizing the whole weighted sum of the UAV’s task completion time while satisfying the data transmission task requirement and the UAV’s feasible flight region constraints. However, it is unable to be solved via standard optimization methods mainly on account of lacking a tractable and accurate system model in practice. To overcome this tough issue,we propose a new solution approach by utilizing the most advanced dueling double deep Q network(dueling DDQN) with multi-step learning. Specifically, to improve the algorithm, the extra labels are added to the primitive states. Simulation results indicate the validity and performance superiority of the proposed algorithm under different data thresholds compared with two other benchmarks.展开更多
In order to improve the performance of UAV's autonomous maneuvering decision-making,this paper proposes a decision-making method based on situational continuity.The algorithm in this paper designs a situation eval...In order to improve the performance of UAV's autonomous maneuvering decision-making,this paper proposes a decision-making method based on situational continuity.The algorithm in this paper designs a situation evaluation function with strong guidance,then trains the Long Short-Term Memory(LSTM)under the framework of Deep Q Network(DQN)for air combat maneuvering decision-making.Considering the continuity between adjacent situations,the method takes multiple consecutive situations as one input of the neural network.To reflect the difference between adjacent situations,the method takes the difference of situation evaluation value as the reward of reinforcement learning.In different scenarios,the algorithm proposed in this paper is compared with the algorithm based on the Fully Neural Network(FNN)and the algorithm based on statistical principles respectively.The results show that,compared with the FNN algorithm,the algorithm proposed in this paper is more accurate and forwardlooking.Compared with the algorithm based on the statistical principles,the decision-making of the algorithm proposed in this paper is more efficient and its real-time performance is better.展开更多
Unmanned aerial vehicles(UAVs)have gained much attention from academic and industrial areas due to the significant number of potential applications in urban airspace.A traffic management system for these UAVs is neede...Unmanned aerial vehicles(UAVs)have gained much attention from academic and industrial areas due to the significant number of potential applications in urban airspace.A traffic management system for these UAVs is needed to manage this future traffic.Tactical conflict resolution for unmanned aerial systems(UASs)is an essential piece of the puzzle for the future UAS Traffic Management(UTM),especially in very low-level(VLL)urban airspace.Unlike conflict resolution in higher altitude airspace,the dense high-rise buildings are an essential source of potential conflict to be considered in VLL urban airspace.In this paper,we propose an attention-based deep reinforcement learning approach to solve the tactical conflict resolution problem.Specifically,we formulate this task as a sequential decision-making problem using Markov Decision Process(MDP).The double deep Q network(DDQN)framework is used as a learning framework for the host drone to learn to output conflict-free maneuvers at each time step.We use the attention mechanism to model the individual neighbor's effect on the host drone,endowing the learned conflict resolution policy to be adapted to an arbitrary number of neighboring drones.Lastly,we build a simulation environment with various scenarios covering different types of encounters to evaluate the proposed approach.The simulation results demonstrate that our proposed algorithm provides a reliable solution to minimize secondary conflict counts compared to learning and non-learning-based approaches under different traffic density scenarios.展开更多
With the integration of alternative energy and renewables,the issue of stability and resilience of the power network has received considerable attention.The basic necessity for fault diagnosis and isolation is fault i...With the integration of alternative energy and renewables,the issue of stability and resilience of the power network has received considerable attention.The basic necessity for fault diagnosis and isolation is fault identification and location.The conventional intelligent fault identification method needs supervision,manual labelling of characteristics,and requires large amounts of labelled data.To enhance the ability of intelligent methods and get rid of the dependence on a large amount of labelled data,a novel fault identification method based on deep reinforcement learning(DRL),which has not received enough attention in the field of fault identification,is investigated in this paper.The proposed method uses different faults as parameters of the model to expand the scope of fault identification.In addition,the DRL algorithm can intelligently modify the fault parameters according to the observations obtained from the power network environment,rather than requiring manual and mechanical tuning of parameters.The methodology was tested on the IEEE 14 bus for several scenarios and the performance of the proposed method was compared with that of population-based optimization methods and supervised learning methods.The obtained results have confirmed the feasibility and effectiveness of the proposed method.展开更多
With the appearance of a huge number of reusable electronic products,the precise value evaluation has become an urgent problem to be solved in the recycling process.Traditional methods rely on manual intervention most...With the appearance of a huge number of reusable electronic products,the precise value evaluation has become an urgent problem to be solved in the recycling process.Traditional methods rely on manual intervention mostly.In order to make the model more suitable for the dynamic updating,this paper proposes the reinforcement learning based electronic products value prediction model which integrates market information to achieve timely and stable prediction results.The basic attributes and depreciation attributes of the product are modeled by two parallel neural networks separately to learn the different effects for prediction.Most importantly,the double deep Q network is adopted to fuse market information by reinforcement learning strategy,and the training on the old product data can be used to predict the following appeared product,which alleviates the cold start problem.Experiments on the real mobile phone recycling platform data verify that the model has achieved higher accuracy and it has a better generalization ability.展开更多
文摘Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.
基金This work was supported by the Fundamental Research Funds for the Central Universities(No.2019XD-A07)the Director Fund of Beijing Key Laboratory of Space-ground Interconnection and Convergencethe National Key Laboratory of Science and Technology on Vacuum Electronics.
文摘The main aim of future mobile networks is to provide secure,reliable,intelligent,and seamless connectivity.It also enables mobile network operators to ensure their customer’s a better quality of service(QoS).Nowadays,Unmanned Aerial Vehicles(UAVs)are a significant part of the mobile network due to their continuously growing use in various applications.For better coverage,cost-effective,and seamless service connectivity and provisioning,UAVs have emerged as the best choice for telco operators.UAVs can be used as flying base stations,edge servers,and relay nodes in mobile networks.On the other side,Multi-access EdgeComputing(MEC)technology also emerged in the 5G network to provide a better quality of experience(QoE)to users with different QoS requirements.However,UAVs in a mobile network for coverage enhancement and better QoS face several challenges such as trajectory designing,path planning,optimization,QoS assurance,mobilitymanagement,etc.The efficient and proactive path planning and optimization in a highly dynamic environment containing buildings and obstacles are challenging.So,an automated Artificial Intelligence(AI)enabled QoSaware solution is needed for trajectory planning and optimization.Therefore,this work introduces a well-designed AI and MEC-enabled architecture for a UAVs-assisted future network.It has an efficient Deep Reinforcement Learning(DRL)algorithm for real-time and proactive trajectory planning and optimization.It also fulfills QoS-aware service provisioning.A greedypolicy approach is used to maximize the long-term reward for serving more users withQoS.Simulation results reveal the superiority of the proposed DRL mechanism for energy-efficient and QoS-aware trajectory planning over the existing models.
基金supported by the Aeronautical Science Foundation(2017ZC53033).
文摘The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.
基金supported by the Natural Science Basic Research Prog ram of Shaanxi(2022JQ-593)。
文摘To address the shortcomings of single-step decision making in the existing deep reinforcement learning based unmanned aerial vehicle(UAV)real-time path planning problem,a real-time UAV path planning algorithm based on long shortterm memory(RPP-LSTM)network is proposed,which combines the memory characteristics of recurrent neural network(RNN)and the deep reinforcement learning algorithm.LSTM networks are used in this algorithm as Q-value networks for the deep Q network(DQN)algorithm,which makes the decision of the Q-value network has some memory.Thanks to LSTM network,the Q-value network can use the previous environmental information and action information which effectively avoids the problem of single-step decision considering only the current environment.Besides,the algorithm proposes a hierarchical reward and punishment function for the specific problem of UAV real-time path planning,so that the UAV can more reasonably perform path planning.Simulation verification shows that compared with the traditional feed-forward neural network(FNN)based UAV autonomous path planning algorithm,the RPP-LSTM proposed in this paper can adapt to more complex environments and has significantly improved robustness and accuracy when performing UAV real-time path planning.
基金supported by the National Natural Science Foundation of China (No. 61931011)。
文摘Unmanned aerial vehicles(UAVs) are increasingly considered in safe autonomous navigation systems to explore unknown environments where UAVs are equipped with multiple sensors to perceive the surroundings. However, how to achieve UAVenabled data dissemination and also ensure safe navigation synchronously is a new challenge. In this paper, our goal is minimizing the whole weighted sum of the UAV’s task completion time while satisfying the data transmission task requirement and the UAV’s feasible flight region constraints. However, it is unable to be solved via standard optimization methods mainly on account of lacking a tractable and accurate system model in practice. To overcome this tough issue,we propose a new solution approach by utilizing the most advanced dueling double deep Q network(dueling DDQN) with multi-step learning. Specifically, to improve the algorithm, the extra labels are added to the primitive states. Simulation results indicate the validity and performance superiority of the proposed algorithm under different data thresholds compared with two other benchmarks.
基金supported by the Natural Science Basic Research Program of Shaanxi(Program No.2022JQ-593)。
文摘In order to improve the performance of UAV's autonomous maneuvering decision-making,this paper proposes a decision-making method based on situational continuity.The algorithm in this paper designs a situation evaluation function with strong guidance,then trains the Long Short-Term Memory(LSTM)under the framework of Deep Q Network(DQN)for air combat maneuvering decision-making.Considering the continuity between adjacent situations,the method takes multiple consecutive situations as one input of the neural network.To reflect the difference between adjacent situations,the method takes the difference of situation evaluation value as the reward of reinforcement learning.In different scenarios,the algorithm proposed in this paper is compared with the algorithm based on the Fully Neural Network(FNN)and the algorithm based on statistical principles respectively.The results show that,compared with the FNN algorithm,the algorithm proposed in this paper is more accurate and forwardlooking.Compared with the algorithm based on the statistical principles,the decision-making of the algorithm proposed in this paper is more efficient and its real-time performance is better.
基金supported by the National Research Foundation(NRF),Singapore,and the Civil Aviation Authority of Singapore(CAAS),under the Aviation Transformation Programme(ATP).
文摘Unmanned aerial vehicles(UAVs)have gained much attention from academic and industrial areas due to the significant number of potential applications in urban airspace.A traffic management system for these UAVs is needed to manage this future traffic.Tactical conflict resolution for unmanned aerial systems(UASs)is an essential piece of the puzzle for the future UAS Traffic Management(UTM),especially in very low-level(VLL)urban airspace.Unlike conflict resolution in higher altitude airspace,the dense high-rise buildings are an essential source of potential conflict to be considered in VLL urban airspace.In this paper,we propose an attention-based deep reinforcement learning approach to solve the tactical conflict resolution problem.Specifically,we formulate this task as a sequential decision-making problem using Markov Decision Process(MDP).The double deep Q network(DDQN)framework is used as a learning framework for the host drone to learn to output conflict-free maneuvers at each time step.We use the attention mechanism to model the individual neighbor's effect on the host drone,endowing the learned conflict resolution policy to be adapted to an arbitrary number of neighboring drones.Lastly,we build a simulation environment with various scenarios covering different types of encounters to evaluate the proposed approach.The simulation results demonstrate that our proposed algorithm provides a reliable solution to minimize secondary conflict counts compared to learning and non-learning-based approaches under different traffic density scenarios.
基金supported by Fundamental Research Funds Program for the Central Universities(No.2019MS014)Key-Area Research and Development Program of Guangdong Province(No.2020B010166004).
文摘With the integration of alternative energy and renewables,the issue of stability and resilience of the power network has received considerable attention.The basic necessity for fault diagnosis and isolation is fault identification and location.The conventional intelligent fault identification method needs supervision,manual labelling of characteristics,and requires large amounts of labelled data.To enhance the ability of intelligent methods and get rid of the dependence on a large amount of labelled data,a novel fault identification method based on deep reinforcement learning(DRL),which has not received enough attention in the field of fault identification,is investigated in this paper.The proposed method uses different faults as parameters of the model to expand the scope of fault identification.In addition,the DRL algorithm can intelligently modify the fault parameters according to the observations obtained from the power network environment,rather than requiring manual and mechanical tuning of parameters.The methodology was tested on the IEEE 14 bus for several scenarios and the performance of the proposed method was compared with that of population-based optimization methods and supervised learning methods.The obtained results have confirmed the feasibility and effectiveness of the proposed method.
基金supported by the National Key R&D Program of China(Grant Nos.2018YFC1900804 and 2019YFC1906002)。
文摘With the appearance of a huge number of reusable electronic products,the precise value evaluation has become an urgent problem to be solved in the recycling process.Traditional methods rely on manual intervention mostly.In order to make the model more suitable for the dynamic updating,this paper proposes the reinforcement learning based electronic products value prediction model which integrates market information to achieve timely and stable prediction results.The basic attributes and depreciation attributes of the product are modeled by two parallel neural networks separately to learn the different effects for prediction.Most importantly,the double deep Q network is adopted to fuse market information by reinforcement learning strategy,and the training on the old product data can be used to predict the following appeared product,which alleviates the cold start problem.Experiments on the real mobile phone recycling platform data verify that the model has achieved higher accuracy and it has a better generalization ability.