期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
异策略深度强化学习中的经验回放研究综述
1
作者 胡子剑 高晓光 +3 位作者 万开方 张乐天 汪强龙 neretin evgeny 《自动化学报》 EI CAS CSCD 北大核心 2023年第11期2237-2256,共20页
作为一种不需要事先获得训练数据的机器学习方法,强化学习(Reinforcement learning,RL)在智能体与环境的不断交互过程中寻找最优策略,是解决序贯决策问题的一种重要方法.通过与深度学习(Deep learning,DL)结合,深度强化学习(Deep reinfo... 作为一种不需要事先获得训练数据的机器学习方法,强化学习(Reinforcement learning,RL)在智能体与环境的不断交互过程中寻找最优策略,是解决序贯决策问题的一种重要方法.通过与深度学习(Deep learning,DL)结合,深度强化学习(Deep reinforcement learning,DRL)同时具备了强大的感知和决策能力,被广泛应用于多个领域来解决复杂的决策问题.异策略强化学习通过将交互经验进行存储和回放,将探索和利用分离开来,更易寻找到全局最优解.如何对经验进行合理高效的利用是提升异策略强化学习方法效率的关键.首先对强化学习的基本理论进行介绍;随后对同策略和异策略强化学习算法进行简要介绍;接着介绍经验回放(Experience replay,ER)问题的两种主流解决方案,包括经验利用和经验增广;最后对相关的研究工作进行总结和展望. 展开更多
关键词 深度强化学习 异策略 经验回放 人工智能
下载PDF
Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning 被引量:1
2
作者 Bo Li Jingyi Huang +4 位作者 Shuangxia Bai Zhigang Gan Shiyang Liang neretin evgeny Shouwen Yao 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第1期64-81,共18页
Aiming at addressing the problem of manoeuvring decision-making in UAV air combat,this study establishes a one-to-one air combat model,defines missile attack areas,and uses the non-deterministic policy Soft-Actor-Crit... Aiming at addressing the problem of manoeuvring decision-making in UAV air combat,this study establishes a one-to-one air combat model,defines missile attack areas,and uses the non-deterministic policy Soft-Actor-Critic(SAC)algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process.At the same time,the complexity of the proposed algorithm is calculated,and the stability of the closed-loop system of air combat decision-making controlled by neural network is analysed by the Lyapunov function.This study defines the UAV air combat process as a gaming process and proposes a Parallel Self-Play training SAC algorithm(PSP-SAC)to improve the generalisation performance of UAV control decisions.Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training. 展开更多
关键词 air combat decision deep reinforcement learning parallel self-play SAC algorithm UAV
下载PDF
UAV Maneuvering Decision-Making Algorithm Based on Twin Delayed Deep Deterministic Policy Gradient Algorithm 被引量:5
3
作者 Bai Shuangxia Song Shaomei +3 位作者 Liang Shiyang Wang Jianmei Li Bo neretin evgeny 《Journal of Artificial Intelligence and Technology》 2022年第1期16-22,共7页
Aiming at intelligent decision-making of unmanned aerial vehicle(UAV)based on situation information in air combat,a novelmaneuvering decision method based on deep reinforcement learning is proposed in this paper.The a... Aiming at intelligent decision-making of unmanned aerial vehicle(UAV)based on situation information in air combat,a novelmaneuvering decision method based on deep reinforcement learning is proposed in this paper.The autonomous maneuvering model ofUAV is established byMarkovDecision Process.The Twin DelayedDeep Deterministic Policy Gradient(TD3)algorithm and the Deep Deterministic Policy Gradient(DDPG)algorithm in deep reinforcement learning are used to train the model,and the experimental results of the two algorithms are analyzed and compared.The simulation experiment results show that compared with the DDPG algorithm,the TD3 algorithm has stronger decision-making performance and faster convergence speed and is more suitable for solving combat problems.The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position,speed,and relative azimuth,adjust their actions to approach,and successfully strike the enemy,providing a new method for UAVs to make intelligent maneuvering decisions during air combat. 展开更多
关键词 air combat DDPG maneuvering decision-making TD3
下载PDF
Imaginary filtered hindsight experience replay for UAV tracking dynamic targets in large-scale unknown environments
4
作者 Zijian HU Xiaoguang GAO +2 位作者 Kaifang WAN neretin evgeny Jinliang LI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第5期377-391,共15页
As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and ... As an advanced combat weapon,Unmanned Aerial Vehicles(UAVs)have been widely used in military wars.In this paper,we formulated the Autonomous Navigation Control(ANC)problem of UAVs as a Markov Decision Process(MDP)and proposed a novel Deep Reinforcement Learning(DRL)method to allow UAVs to perform dynamic target tracking tasks in large-scale unknown environments.To solve the problem of limited training experience,the proposed Imaginary Filtered Hindsight Experience Replay(IFHER)generates successful episodes by reasonably imagining the target trajectory in the failed episode to augment the experiences.The welldesigned goal,episode,and quality filtering strategies ensure that only high-quality augmented experiences can be stored,while the sampling filtering strategy of IFHER ensures that these stored augmented experiences can be fully learned according to their high priorities.By training in a complex environment constructed based on the parameters of a real UAV,the proposed IFHER algorithm improves the convergence speed by 28.99%and the convergence result by 11.57%compared to the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm.The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the IFHER agent.Moreover,the flight trajectory of the IFHER agent shows the superiority of the learned policy and the practical application value of the algorithm. 展开更多
关键词 Artificial intelligence Autonomous navigation control Deep reinforcement learning Hindsight experience replay UAV
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部