Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can lear...Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment.In the paper,we propose the sampled-data RL control strategy to reduce the computational demand.In the sampled-data control strategy,the whole control system is of a hybrid structure,in which the plant is of continuous structure while the controller(RL agent)adopts a discrete structure.Given that the continuous states of the plant will be the input of the agent,the state–action value function is approximated by the fully connected feed-forward neural networks(FCFFNN).Instead of learning the controller at every step during the interaction with the environment,the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay.In the acting stage,the most effective experience obtained during the interaction with the environment will be stored and during the learning stage,the stored experience will be replayed to customized times,which helps enhance the experience replay process.The effectiveness of proposed approach will be verified by simulation examples.展开更多
Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay ...Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.展开更多
In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learni...In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.展开更多
针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题,本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景,构建了红、蓝两方无人机群的博弈对抗模...针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题,本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景,构建了红、蓝两方无人机群的博弈对抗模型,在多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)算法的Actor-Critic框架下,根据博弈环境的特点对原始的MADDPG算法进行改进。为了进一步提升算法对有效经验的探索和利用,本文构建了规则耦合模块以在无人机的决策过程中对Actor网络进行辅助。仿真实验表明,本文设计的算法在收敛速度、学习效率和稳定性方面都取了一定的提升,异构子网络的引入使算法更适用于无人机数量动态衰减的博弈场景;奖励势函数和重要性权重耦合的优先经验回放方法提升了经验差异的细化程度及优势经验利用率;规则耦合模块的引入实现了无人机决策网络对先验知识的有效利用。展开更多
随着网络技术的不断发展,基于Fat-Tree的网络拓扑结构分布式网络控制模式逐渐显露出其局限性,软件定义数据中心网络(software-defined data center network,SDCN)技术作为Fat-Tree网络拓扑的改进技术,受到越来越多研究者的关注。首先搭...随着网络技术的不断发展,基于Fat-Tree的网络拓扑结构分布式网络控制模式逐渐显露出其局限性,软件定义数据中心网络(software-defined data center network,SDCN)技术作为Fat-Tree网络拓扑的改进技术,受到越来越多研究者的关注。首先搭建了一个SDCN中的边缘计算架构和基于移动边缘计算(mobileedge computing,MEC)平台三层服务架构的任务卸载模型,结合移动边缘计算平台的实际应用场景,利用同策略经验回放和熵正则改进传统的深度Q网络(deep Q-leaning network,DQN)算法,优化了MEC平台的任务卸载策略,并设计了实验对基于同策略经验回放和熵正则的改进深度Q网络算法(improved DQN algorithm based on same strategy empirical playback and entropy regularization,RSS2E-DQN)和其他3种算法在负载均衡、能耗、时延、网络使用量几个方面进行对比分析,验证了改进算法在上述4个方面具有更优越的性能。展开更多
本文针对多房间的移动机器人内墙作业的路径规划任务,提出一种两阶段路径规划方法.第1阶段针对沿墙作业过程中环境存在灰尘或雾气造成的传感器失效问题,以及房间多出口时路径规划不完整问题,我们提出起点自动选择沿墙路径规划方法,基于...本文针对多房间的移动机器人内墙作业的路径规划任务,提出一种两阶段路径规划方法.第1阶段针对沿墙作业过程中环境存在灰尘或雾气造成的传感器失效问题,以及房间多出口时路径规划不完整问题,我们提出起点自动选择沿墙路径规划方法,基于栅格地图离线生成沿墙规划路径.第2阶段,针对点到点路径规划过程中的动态避障问题,我们提出一种基于PSAC (prioritized experience replay soft actor critic)算法的点到点路径规划方法,在软行动者-评论家(soft actor critic,SAC)的中引入优先级经验回放策略,实现机器人的动态避障.实验部分设计了沿墙路径规划对比实验和动态避障的对比实验,验证本文所提出的方法在室内沿墙路径规划和点到点路径规划的有效性.展开更多
As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and ...As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and proposed a novel Deep Reinforcement Learning(DRL)method to allow UAVs to perform dynamic target tracking tasks in large-scale unknown environments.To solve the problem of limited training experience,the proposed Imaginary Filtered Hindsight Experience Replay(IFHER)generates successful episodes by reasonably imagining the target trajectory in the failed episode to augment the experiences.The welldesigned goal,episode,and quality filtering strategies ensure that only high-quality augmented experiences can be stored,while the sampling filtering strategy of IFHER ensures that these stored augmented experiences can be fully learned according to their high priorities.By training in a complex environment constructed based on the parameters of a real UAV,the proposed IFHER algorithm improves the convergence speed by 28.99%and the convergence result by 11.57%compared to the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm.The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the IFHER agent.Moreover,the flight trajectory of the IFHER agent shows the superiority of the learned policy and the practical application value of the algorithm.展开更多
基金supported by Imperial College London,UK,King’s College London,UK and Engineering and Physical Sciences Research Council(EPSRC),UK.
文摘Reinforcement Learning(RL)based control algorithms can learn the control strategies for nonlinear and uncertain environment during interacting with it.Guided by the rewards generated by environment,a RL agent can learn the control strategy directly in a model-free way instead of investigating the dynamic model of the environment.In the paper,we propose the sampled-data RL control strategy to reduce the computational demand.In the sampled-data control strategy,the whole control system is of a hybrid structure,in which the plant is of continuous structure while the controller(RL agent)adopts a discrete structure.Given that the continuous states of the plant will be the input of the agent,the state–action value function is approximated by the fully connected feed-forward neural networks(FCFFNN).Instead of learning the controller at every step during the interaction with the environment,the learning and acting stages are decoupled to learn the control strategy more effectively through experience replay.In the acting stage,the most effective experience obtained during the interaction with the environment will be stored and during the learning stage,the stored experience will be replayed to customized times,which helps enhance the experience replay process.The effectiveness of proposed approach will be verified by simulation examples.
基金supported by the National Natural Science Foundation of China(61751210,61572441)。
文摘Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.
基金supported in part by the National Key Research and Development Program of China(2021YFB1714700)the National Natural Science Foundation of China(62022061,6192100028)。
文摘In this paper,a data-based feedback relearning algorithm is proposed for the robust control problem of uncertain nonlinear systems.Motivated by the classical on-policy and off-policy algorithms of reinforcement learning,the online feedback relearning(FR)algorithm is developed where the collected data includes the influence of disturbance signals.The FR algorithm has better adaptability to environmental changes(such as the control channel disturbances)compared with the off-policy algorithm,and has higher computational efficiency and better convergence performance compared with the on-policy algorithm.Data processing based on experience replay technology is used for great data efficiency and convergence stability.Simulation experiments are presented to illustrate convergence stability,optimality and algorithmic performance of FR algorithm by comparison.
文摘针对多无人机博弈对抗过程中无人机数量动态衰减问题和传统深度强化学习算法中的稀疏奖励问题及无效经验抽取频率过高问题,本文以攻防能力及通信范围受限条件下的多无人机博弈对抗任务为研究背景,构建了红、蓝两方无人机群的博弈对抗模型,在多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)算法的Actor-Critic框架下,根据博弈环境的特点对原始的MADDPG算法进行改进。为了进一步提升算法对有效经验的探索和利用,本文构建了规则耦合模块以在无人机的决策过程中对Actor网络进行辅助。仿真实验表明,本文设计的算法在收敛速度、学习效率和稳定性方面都取了一定的提升,异构子网络的引入使算法更适用于无人机数量动态衰减的博弈场景;奖励势函数和重要性权重耦合的优先经验回放方法提升了经验差异的细化程度及优势经验利用率;规则耦合模块的引入实现了无人机决策网络对先验知识的有效利用。
文摘随着网络技术的不断发展,基于Fat-Tree的网络拓扑结构分布式网络控制模式逐渐显露出其局限性,软件定义数据中心网络(software-defined data center network,SDCN)技术作为Fat-Tree网络拓扑的改进技术,受到越来越多研究者的关注。首先搭建了一个SDCN中的边缘计算架构和基于移动边缘计算(mobileedge computing,MEC)平台三层服务架构的任务卸载模型,结合移动边缘计算平台的实际应用场景,利用同策略经验回放和熵正则改进传统的深度Q网络(deep Q-leaning network,DQN)算法,优化了MEC平台的任务卸载策略,并设计了实验对基于同策略经验回放和熵正则的改进深度Q网络算法(improved DQN algorithm based on same strategy empirical playback and entropy regularization,RSS2E-DQN)和其他3种算法在负载均衡、能耗、时延、网络使用量几个方面进行对比分析,验证了改进算法在上述4个方面具有更优越的性能。
文摘本文针对多房间的移动机器人内墙作业的路径规划任务,提出一种两阶段路径规划方法.第1阶段针对沿墙作业过程中环境存在灰尘或雾气造成的传感器失效问题,以及房间多出口时路径规划不完整问题,我们提出起点自动选择沿墙路径规划方法,基于栅格地图离线生成沿墙规划路径.第2阶段,针对点到点路径规划过程中的动态避障问题,我们提出一种基于PSAC (prioritized experience replay soft actor critic)算法的点到点路径规划方法,在软行动者-评论家(soft actor critic,SAC)的中引入优先级经验回放策略,实现机器人的动态避障.实验部分设计了沿墙路径规划对比实验和动态避障的对比实验,验证本文所提出的方法在室内沿墙路径规划和点到点路径规划的有效性.
基金co-supported by the National Natural Science Foundation of China(Nos.62003267 and 61573285)the Natural Science Basic Research Plan in Shaanxi Province of China(No.2020JQ-220)+1 种基金the Open Project of Science and Technology on Electronic Information Control Laboratory,China(No.JS20201100339)the Open Project of Science and Technology on Electromagnetic Space Operations and Applications Laboratory,China(No.JS20210586512).
文摘As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and proposed a novel Deep Reinforcement Learning(DRL)method to allow UAVs to perform dynamic target tracking tasks in large-scale unknown environments.To solve the problem of limited training experience,the proposed Imaginary Filtered Hindsight Experience Replay(IFHER)generates successful episodes by reasonably imagining the target trajectory in the failed episode to augment the experiences.The welldesigned goal,episode,and quality filtering strategies ensure that only high-quality augmented experiences can be stored,while the sampling filtering strategy of IFHER ensures that these stored augmented experiences can be fully learned according to their high priorities.By training in a complex environment constructed based on the parameters of a real UAV,the proposed IFHER algorithm improves the convergence speed by 28.99%and the convergence result by 11.57%compared to the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm.The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the IFHER agent.Moreover,the flight trajectory of the IFHER agent shows the superiority of the learned policy and the practical application value of the algorithm.