摘要
作为一种不需要事先获得训练数据的机器学习方法,强化学习(Reinforcement learning,RL)在智能体与环境的不断交互过程中寻找最优策略,是解决序贯决策问题的一种重要方法.通过与深度学习(Deep learning,DL)结合,深度强化学习(Deep reinforcement learning,DRL)同时具备了强大的感知和决策能力,被广泛应用于多个领域来解决复杂的决策问题.异策略强化学习通过将交互经验进行存储和回放,将探索和利用分离开来,更易寻找到全局最优解.如何对经验进行合理高效的利用是提升异策略强化学习方法效率的关键.首先对强化学习的基本理论进行介绍;随后对同策略和异策略强化学习算法进行简要介绍;接着介绍经验回放(Experience replay,ER)问题的两种主流解决方案,包括经验利用和经验增广;最后对相关的研究工作进行总结和展望.
As a machine learning method that does not need to obtain training data in advance,reinforcement learning(RL)is an important method to solve the sequential decision-making problem by finding the optimal strategy in the continuous interaction between the agent and the environment.Through the combination of deep learning(DL),deep reinforcement learning(DRL)has both powerful perception and decision-making capabilities,and is widely used in many fields to solve complex decision-making problems.Off-policy reinforcement learning separates exploration and utilization by storing and replaying interactive experience,making it easier to find the global optimal solution.How to make reasonable and efficient use of experience is the key to improve the efficiency of off-policy reinforcement learning methods.First,this paper introduces the basic theory of reinforcement learning.Then,the on-policy and off-policy reinforcement learning algorithms are briefly introduced.Next,two mainstream solutions of experience replay(ER)problem are introduced,including experience utilization and experience expansion.Finally,the relevant research work is summarized and prospected.
作者
胡子剑
高晓光
万开方
张乐天
汪强龙
NERETIN Evgeny
HU Zi-Jian;GAO Xiao-Guang;WAN Kai-Fang;ZHANG Le-Tian;WANG Qiang-Long;NERETIN Evgeny(School of Electronics and Information,Northwestern Polytechnical University,Xi'an 710129,China;School of Foreign Languages,Xidian University,Xi'an 710126,China;School of Robotic and Intelligent Systems,Moscow Aviation Institute(National Research University),Moscow 125993,Russia)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2023年第11期2237-2256,共20页
Acta Automatica Sinica
基金
国家自然科学基金(62003267,61573285)
中央高校基本科研业务费专项资金(G2022KY0602)
电磁空间作战与应用重点实验室(2022ZX0090)
西安市科技计划项目--关键核心技术攻关工程项目计划(21RGZN0016)
陕西省重点研发计划项目(2023-GHZD-33)资助。
关键词
深度强化学习
异策略
经验回放
人工智能
Deep reinforcement learning(DRL)
off-policy
experience replay(ER)
artificial intelligence