期刊文献+

基于自适应噪声的最大熵进化强化学习方法 被引量:2

Adaptive Noise-based Evolutionary Reinforcement Learning With Maximum Entropy
下载PDF
导出
摘要 近年来,进化策略由于其无梯度优化和高并行化效率等优点,在深度强化学习领域得到了广泛的应用.然而,传统基于进化策略的深度强化学习方法存在着学习速度慢、容易收敛到局部最优和鲁棒性较弱等问题.为此,提出了一种基于自适应噪声的最大熵进化强化学习方法.首先,引入了一种进化策略的改进办法,在“优胜”的基础上加强了“劣汰”,从而提高进化强化学习的收敛速度;其次,在目标函数中引入了策略最大熵正则项,来保证策略的随机性进而鼓励智能体对新策略的探索;最后,提出了自适应噪声控制的方式,根据当前进化情形智能化调整进化策略的搜索范围,进而减少对先验知识的依赖并提升算法的鲁棒性.实验结果表明,该方法较之传统方法在学习速度、最优性收敛和鲁棒性上有比较明显的提升. Recently,evolution strategies have been widely investigated in the field of deep reinforcement learning due to their promising properties of derivative-free optimization and high parallelization efficiency.However,traditional evolutionary reinforcement learning methods suffer from several problems,including the slow learning speed,the tendency toward local optima,and the poor robustness.A systematic method is proposed,named adaptive noisebased evolutionary reinforcement learning with maximum entropy,to tackle these problems.First,the canonical evolution strategies is introduced to enhance the influence of well-behaved individuals and weaken the impact of those with bad performance,thus improving the learning speed of evolutionary reinforcement learning.Second,a regularization term of maximizing the policy entropy is incorporated into the objective function,which ensures moderate stochastically of actions and encourages the exploration to new promising solutions.Third,the exploration noise is proposed to automatically adapt according to the current evolutionary situation,which reduces the dependence on prior knowledge and promotes the robustness of evolution.Experimental results show that this method achieves faster learning speed,better convergence to global optima,and improved robustness,compared to traditional approaches.
作者 王君逸 王志 李华雄 陈春林 WANG Jun-Yi;WANG Zhi;LI Hua-Xiong;CHEN Chun-Lin(Department of Control Science and Intelligence Engineering,Nanjing University,Nanjing 210008)
出处 《自动化学报》 EI CAS CSCD 北大核心 2023年第1期54-66,共13页 Acta Automatica Sinica
基金 国家自然科学基金(62006111,62073160,62176116) 江苏省自然科学基金(BK20200330)资助。
关键词 深度强化学习 进化策略 进化强化学习 最大熵 自适应噪声 Deep reinforcement learning evolution strategies evolutionary reinforcement learning maximum entropy adaptive noise
  • 相关文献

参考文献7

二级参考文献152

  • 1刘习春,喻寿益.局部快速微调遗传算法[J].计算机学报,2006,29(1):100-105. 被引量:37
  • 2Fogel L J, Owens A J, Walsh M J. Artificial Intelligence Through Simulated Evolution: Forty Years of Evolutionary Programming. New York: Wiley-Interscience, 1999.
  • 3Yao X, Liu Y, Lin G M. Evolutionary programming made faster. IEEE Transactions on Evolutionary Computation, 1999, 3(2): 82-102.
  • 4Lee C Y, Yao X. Evolutionary programming using mutations based on the Levy probability distribution. IEEE Transactions on Evolutionary Computation, 2004, 8(1): 1-13.
  • 5Ji M J, Tang H W, Guo J. A single-point mutation evolutionary programming. Information Processing Letters, 2004, 90(6): 293-299.
  • 6Dong H, He J, Huang H, Hou W. Evolutionary programming using a mixed mutation strategy IOnline], available: http://www.cs.bham.ac.uk/jxh/hejunpl.html, December 20, 2006.
  • 7Fogel D B. Evolving Artificial Intelligence [Ph.D. dissertation].California, USA: University of California. 1992.
  • 8Iwamatsu M. Generalized evolutionary programming with Levy-type mutation. Computer Physics Communications, 2002, 147(1): 729-732.
  • 9Lee S H, Jun H B, Sim K B. Performance improvement of evolution strategies using reinforcement learning. In: Proceedings of IEEE International Fuzzy Systems Conference. Seoul, Korea: IEEE, 1999. 639-644.
  • 10Sutton R S, Barto A C. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998.

共引文献344

同被引文献23

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部