摘要
近年来,进化策略由于其无梯度优化和高并行化效率等优点,在深度强化学习领域得到了广泛的应用.然而,传统基于进化策略的深度强化学习方法存在着学习速度慢、容易收敛到局部最优和鲁棒性较弱等问题.为此,提出了一种基于自适应噪声的最大熵进化强化学习方法.首先,引入了一种进化策略的改进办法,在“优胜”的基础上加强了“劣汰”,从而提高进化强化学习的收敛速度;其次,在目标函数中引入了策略最大熵正则项,来保证策略的随机性进而鼓励智能体对新策略的探索;最后,提出了自适应噪声控制的方式,根据当前进化情形智能化调整进化策略的搜索范围,进而减少对先验知识的依赖并提升算法的鲁棒性.实验结果表明,该方法较之传统方法在学习速度、最优性收敛和鲁棒性上有比较明显的提升.
Recently,evolution strategies have been widely investigated in the field of deep reinforcement learning due to their promising properties of derivative-free optimization and high parallelization efficiency.However,traditional evolutionary reinforcement learning methods suffer from several problems,including the slow learning speed,the tendency toward local optima,and the poor robustness.A systematic method is proposed,named adaptive noisebased evolutionary reinforcement learning with maximum entropy,to tackle these problems.First,the canonical evolution strategies is introduced to enhance the influence of well-behaved individuals and weaken the impact of those with bad performance,thus improving the learning speed of evolutionary reinforcement learning.Second,a regularization term of maximizing the policy entropy is incorporated into the objective function,which ensures moderate stochastically of actions and encourages the exploration to new promising solutions.Third,the exploration noise is proposed to automatically adapt according to the current evolutionary situation,which reduces the dependence on prior knowledge and promotes the robustness of evolution.Experimental results show that this method achieves faster learning speed,better convergence to global optima,and improved robustness,compared to traditional approaches.
作者
王君逸
王志
李华雄
陈春林
WANG Jun-Yi;WANG Zhi;LI Hua-Xiong;CHEN Chun-Lin(Department of Control Science and Intelligence Engineering,Nanjing University,Nanjing 210008)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2023年第1期54-66,共13页
Acta Automatica Sinica
基金
国家自然科学基金(62006111,62073160,62176116)
江苏省自然科学基金(BK20200330)资助。
关键词
深度强化学习
进化策略
进化强化学习
最大熵
自适应噪声
Deep reinforcement learning
evolution strategies
evolutionary reinforcement learning
maximum entropy
adaptive noise