Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path pl...Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments.展开更多
Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a st...Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.展开更多
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai...The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.展开更多
The main function of the power communication business is to monitor,control and manage the power communication network to ensure normal and stable operation of the power communication network.Commu-nication services r...The main function of the power communication business is to monitor,control and manage the power communication network to ensure normal and stable operation of the power communication network.Commu-nication services related to dispatching data networks and the transmission of fault information or feeder automation have high requirements for delay.If processing time is prolonged,a power business cascade reaction may be triggered.In order to solve the above problems,this paper establishes an edge object-linked agent business deployment model for power communication network to unify the management of data collection,resource allocation and task scheduling within the system,realizes the virtualization of object-linked agent computing resources through Docker container technology,designs the target model of network latency and energy consumption,and introduces A3C algorithm in deep reinforcement learning,improves it according to scene characteristics,and sets corresponding optimization strategies.Mini-mize network delay and energy consumption;At the same time,to ensure that sensitive power business is handled in time,this paper designs the business dispatch model and task migration model,and solves the problem of server failure.Finally,the corresponding simulation program is designed to verify the feasibility and validity of this method,and to compare it with other existing mechanisms.展开更多
Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical ...Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical challenge.This study proposes an edge task scheduling approach based on an improved Double Deep Q Network(DQN),which is adopted to separate the calculations of target Q values and the selection of the action in two networks.A new reward function is designed,and a control unit is added to the experience replay unit of the agent.The management of experience data are also modified to fully utilize its value and improve learning efficiency.Reinforcement learning agents usually learn from an ignorant state,which is inefficient.As such,this study proposes a novel particle swarm optimization algorithm with an improved fitness function,which can generate optimal solutions for task scheduling.These optimized solutions are provided for the agent to pre-train network parameters to obtain a better cognition level.The proposed algorithm is compared with six other methods in simulation experiments.Results show that the proposed algorithm outperforms other benchmark methods regarding makespan.展开更多
To obtain a suitable scheduling scheme in an effective time range,the minimum completion time is taken as the objective of Flexible Job Shop scheduling Problems(FJSP)with different scales,and Composite Dispatching Rul...To obtain a suitable scheduling scheme in an effective time range,the minimum completion time is taken as the objective of Flexible Job Shop scheduling Problems(FJSP)with different scales,and Composite Dispatching Rules(CDRs)are applied to generate feasible solutions.Firstly,the binary tree coding method is adopted,and the constructed function set is normalized.Secondly,a CDR mining approach based on an Improved Genetic Programming Algorithm(IGPA)is designed.Two population initialization methods are introduced to enrich the initial population,and a superior and inferior population separation strategy is designed to improve the global search ability of the algorithm.At the same time,two individual mutation methods are introduced to improve the algorithm’s local search ability,to achieve the balance between global search and local search.In addition,the effectiveness of the IGPA and the superiority of CDRs are verified through comparative analysis.Finally,Deep Reinforcement Learning(DRL)is employed to solve the FJSP by incorporating the CDRs as the action set,the selection times are counted to further verify the superiority of CDRs.展开更多
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor...In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.展开更多
Unmanned Aerial Vehicle(UAV)has emerged as a promising technology for the support of human activities,such as target tracking,disaster rescue,and surveillance.However,these tasks require a large computation load of im...Unmanned Aerial Vehicle(UAV)has emerged as a promising technology for the support of human activities,such as target tracking,disaster rescue,and surveillance.However,these tasks require a large computation load of image or video processing,which imposes enormous pressure on the UAV computation platform.To solve this issue,in this work,we propose an intelligent Task Offloading Algorithm(iTOA)for UAV edge computing network.Compared with existing methods,iTOA is able to perceive the network’s environment intelligently to decide the offloading action based on deep Monte Calor Tree Search(MCTS),the core algorithm of Alpha Go.MCTS will simulate the offloading decision trajectories to acquire the best decision by maximizing the reward,such as lowest latency or power consumption.To accelerate the search convergence of MCTS,we also proposed a splitting Deep Neural Network(sDNN)to supply the prior probability for MCTS.The sDNN is trained by a self-supervised learning manager.Here,the training data set is obtained from iTOA itself as its own teacher.Compared with game theory and greedy search-based methods,the proposed iTOA improves service latency performance by 33%and 60%,respectively.展开更多
在机器人自主抓取领域,由于抓取对象的大小形状以及分布状态的随机性,仅靠单一的抓取操作完成对工作区域内物体的抓取是十分困难的,而推动和抓取动作的结合可以降低抓取环境的复杂性,通过推动操作可以改变抓取对象的分布以便于更好的抓...在机器人自主抓取领域,由于抓取对象的大小形状以及分布状态的随机性,仅靠单一的抓取操作完成对工作区域内物体的抓取是十分困难的,而推动和抓取动作的结合可以降低抓取环境的复杂性,通过推动操作可以改变抓取对象的分布以便于更好的抓取。但是推动动作的添加同时也会产生一些无效的推动,会降低模型的学习效率。在基于深度Q网络(deep Q-network,DQN)的视觉推动抓取(visual pushing for grasping,VPG)模型的基础上,提出了一种可供性方案用于简化机器人动作规划空间的搜索复杂度,加快机器人抓取的学习进程。通过减少在任何给定情况下可用的行动数量来实现更快的计划,有助于从数据中更高效和精确地学习模型。最后通过在V-rep仿真平台上的仿真场景验证了所提方法的有效性。展开更多
基金the National Natural Science Foundation of China(No.61973275)。
文摘Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments.
文摘Autonomous navigation of mobile robots is a challenging task that requires them to travel from their initial position to their destination without collision in an environment.Reinforcement Learning methods enable a state action function in mobile robots suited to their environment.During trial-and-error interaction with its surroundings,it helps a robot tofind an ideal behavior on its own.The Deep Q Network(DQN)algorithm is used in TurtleBot 3(TB3)to achieve the goal by successfully avoiding the obstacles.But it requires a large number of training iterations.This research mainly focuses on a mobility robot’s best path prediction utilizing DQN and the Artificial Potential Field(APF)algorithms.First,a TB3 Waffle Pi DQN is built and trained to reach the goal.Then the APF shortest path algorithm is incorporated into the DQN algorithm.The proposed planning approach is compared with the standard DQN method in a virtual environment based on the Robot Operation System(ROS).The results from the simulation show that the combination is effective for DQN and APF gives a better optimal path and takes less time when compared to the conventional DQN algo-rithm.The performance improvement rate of the proposed DQN+APF in comparison with DQN in terms of the number of successful targets is attained by 88%.The performance of the proposed DQN+APF in comparison with DQN in terms of average time is achieved by 0.331 s.The performance of the proposed DQN+APF in comparison with DQN average rewards in which the positive goal is attained by 85%and the negative goal is attained by-90%.
基金supported by the Aeronautical Science Foundation(2017ZC53033).
文摘The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.
基金funded by the“Research on Digitization and Intelligent Application of Low-Voltage Power Distribution Equipment”[SGSDDK00PDJS2000375]。
文摘The main function of the power communication business is to monitor,control and manage the power communication network to ensure normal and stable operation of the power communication network.Commu-nication services related to dispatching data networks and the transmission of fault information or feeder automation have high requirements for delay.If processing time is prolonged,a power business cascade reaction may be triggered.In order to solve the above problems,this paper establishes an edge object-linked agent business deployment model for power communication network to unify the management of data collection,resource allocation and task scheduling within the system,realizes the virtualization of object-linked agent computing resources through Docker container technology,designs the target model of network latency and energy consumption,and introduces A3C algorithm in deep reinforcement learning,improves it according to scene characteristics,and sets corresponding optimization strategies.Mini-mize network delay and energy consumption;At the same time,to ensure that sensitive power business is handled in time,this paper designs the business dispatch model and task migration model,and solves the problem of server failure.Finally,the corresponding simulation program is designed to verify the feasibility and validity of this method,and to compare it with other existing mechanisms.
基金supported by the National Key Research and Development Program of China(No.2021YFE0116900)National Natural Science Foundation of China(Nos.42275157,62002276,and 41975142)Major Program of the National Social Science Fund of China(No.17ZDA092).
文摘Edge computing nodes undertake an increasing number of tasks with the rise of business density.Therefore,how to efficiently allocate large-scale and dynamic workloads to edge computing resources has become a critical challenge.This study proposes an edge task scheduling approach based on an improved Double Deep Q Network(DQN),which is adopted to separate the calculations of target Q values and the selection of the action in two networks.A new reward function is designed,and a control unit is added to the experience replay unit of the agent.The management of experience data are also modified to fully utilize its value and improve learning efficiency.Reinforcement learning agents usually learn from an ignorant state,which is inefficient.As such,this study proposes a novel particle swarm optimization algorithm with an improved fitness function,which can generate optimal solutions for task scheduling.These optimized solutions are provided for the agent to pre-train network parameters to obtain a better cognition level.The proposed algorithm is compared with six other methods in simulation experiments.Results show that the proposed algorithm outperforms other benchmark methods regarding makespan.
基金supported by the National Natural Science Foundation of China(Nos.51805152 and 52075401)the Green Industry Technology Leading Program of Hubei University of Technology(No.XJ2021005001)+1 种基金the Scientific Research Foundation for High-level Talents of Hubei University of Technology(No.GCRC2020009)the Natural Science Foundation of Hubei Province(No.2022CFB445).
文摘To obtain a suitable scheduling scheme in an effective time range,the minimum completion time is taken as the objective of Flexible Job Shop scheduling Problems(FJSP)with different scales,and Composite Dispatching Rules(CDRs)are applied to generate feasible solutions.Firstly,the binary tree coding method is adopted,and the constructed function set is normalized.Secondly,a CDR mining approach based on an Improved Genetic Programming Algorithm(IGPA)is designed.Two population initialization methods are introduced to enrich the initial population,and a superior and inferior population separation strategy is designed to improve the global search ability of the algorithm.At the same time,two individual mutation methods are introduced to improve the algorithm’s local search ability,to achieve the balance between global search and local search.In addition,the effectiveness of the IGPA and the superiority of CDRs are verified through comparative analysis.Finally,Deep Reinforcement Learning(DRL)is employed to solve the FJSP by incorporating the CDRs as the action set,the selection times are counted to further verify the superiority of CDRs.
文摘In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
基金the Artificial Intelligence Key Laboratory of Sichuan Province(Nos.2019RYJ05)National Natural Science Foundation of China(Nos.61971107).
文摘Unmanned Aerial Vehicle(UAV)has emerged as a promising technology for the support of human activities,such as target tracking,disaster rescue,and surveillance.However,these tasks require a large computation load of image or video processing,which imposes enormous pressure on the UAV computation platform.To solve this issue,in this work,we propose an intelligent Task Offloading Algorithm(iTOA)for UAV edge computing network.Compared with existing methods,iTOA is able to perceive the network’s environment intelligently to decide the offloading action based on deep Monte Calor Tree Search(MCTS),the core algorithm of Alpha Go.MCTS will simulate the offloading decision trajectories to acquire the best decision by maximizing the reward,such as lowest latency or power consumption.To accelerate the search convergence of MCTS,we also proposed a splitting Deep Neural Network(sDNN)to supply the prior probability for MCTS.The sDNN is trained by a self-supervised learning manager.Here,the training data set is obtained from iTOA itself as its own teacher.Compared with game theory and greedy search-based methods,the proposed iTOA improves service latency performance by 33%and 60%,respectively.
文摘在机器人自主抓取领域,由于抓取对象的大小形状以及分布状态的随机性,仅靠单一的抓取操作完成对工作区域内物体的抓取是十分困难的,而推动和抓取动作的结合可以降低抓取环境的复杂性,通过推动操作可以改变抓取对象的分布以便于更好的抓取。但是推动动作的添加同时也会产生一些无效的推动,会降低模型的学习效率。在基于深度Q网络(deep Q-network,DQN)的视觉推动抓取(visual pushing for grasping,VPG)模型的基础上,提出了一种可供性方案用于简化机器人动作规划空间的搜索复杂度,加快机器人抓取的学习进程。通过减少在任何给定情况下可用的行动数量来实现更快的计划,有助于从数据中更高效和精确地学习模型。最后通过在V-rep仿真平台上的仿真场景验证了所提方法的有效性。