期刊文献+

强化学习中基于权重梯度下降的函数逼近方法

Function approximation method based on weights gradient descent in reinforcement learning
下载PDF
导出
摘要 函数逼近法(function approximation)是强化学习领域中的一个研究热点,可以有效处理强化学习中大规模、连续状态和动作空间的问题。基于梯度下降(gradient descent)的函数逼近方法虽然是强化学习中使用最广泛的方法之一,但该算法对步长参数的要求较高,取值不当易产生收敛速度慢、收敛不稳定甚至发散的情况。针对这类问题,通过围绕基于函数逼近的TD(TD,temporal difference)算法,在最小二乘方法和梯度下降方法的基础上对权重的更新方法进行了改进,利用最小二乘方法处理值函数求解权重值,并结合时序差分和梯度下降的思想求出权重之间的误差,并利用该误差直接更新权重,从而提出一种权重梯度下降(WGD,weight gradient descent)方法。该方法以全新的方式更新权重,有效降低算法对计算资源的消耗,并且可以有效地对其他基于梯度下降的函数逼近算法进行改进,广泛应用于诸多基于梯度下降的强化学习算法。实验表明,WGD方法能够在更广泛的空间中调整参数,可以有效降低算法发散的可能性,在保证算法拥有良好收敛效果的同时,提高算法的收敛速度。 Function approximation has gained significant attention in reinforcement learning research as it effectively addresses problems with large-scale,continuous state,and action space.Although the function approximation algorithm based on gradient descent method is one of the most widely used methods in reinforcement learning,it requires careful tuning of the step size parameter as an inappropriate value can lead to slow convergence,unstable convergence,or even divergence.To address these issues,an improvement was made around the temporal-difference(TD)algorithm based on function approximation.The weight update method was enhanced using both the least squares method and gradient descent,resulting in the proposed weights gradient descent(WGD)method.The least squares were used to calculate the weights,combining the ideas of TD and gradient descent to find the error between the weights.And this error was used to directly update the weights.By this method,the weights were updated in a new manner,effectively reducing the consumption of computing resources by the algorithm enhancing other gradient descent-based function approximation algorithms.The WGD method is widely applicable in various gradient descent-based reinforcement learning algorithms.The results show that WGD method can adjust parameters within a wider space,effectively reducing the possibility of algorithm divergence.Additionally,it achieves better performance while improving the convergence speed of the algorithm.
作者 秦晓燕 刘禹含 徐云龙 李斌 QIN Xiaoyan;LIU Yuhan;XU Yunlong;LI Bin(School of Information and Software,Global Institute of Software Technology,Suzhou 215163,China;University of Waterloo,Waterloo,N2L3G4,Canada;Applied Technology College,Soochow University,Suzhou 215325,China;School of Computer Science and Technology,Soochow University,Suzhou 215325,China)
出处 《网络与信息安全学报》 2023年第4期16-28,共13页 Chinese Journal of Network and Information Security
基金 国家自然科学基金(61772355,61702055,61876217,62176175) 江苏省高等学校自然科学研究重大项目(18KJA520011,17KJA520004) 苏州市应用基础研究计划工业部分(SYG201422) 江苏省高职院校教师专业带头人高端研修项目(2021GRFX052) 江苏高校优势学科建设工程资助项目 江苏省职业教育软件技术“双师型”名师工作室资助项目。
关键词 函数逼近 强化学习 梯度下降 最小二乘 权重梯度下降 function approximation reinforcement learning gradient descent least-squares weights gradient descent
  • 相关文献

参考文献2

二级参考文献71

  • 1Werbos P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences [Ph.D. dissertation], Harvard University, USA, 1974.
  • 2Parker D B. Learning Logic, Technical Report TR-47, MIT Press, Cambridge, 1985.
  • 3LeCun Y. Une proc6dure d'apprentissage pour R6seau seuil assymatrique (a learning scheme for asymmetric threshold networks). In: Proceddings of the Cognitiva 85. Paris, France. 599-604 (in French).
  • 4Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533-536.
  • 5Bengio Y. Learning Deep Architectures for AI. Hanover MA: Now Publishers Inc. 2009.
  • 6Hinton G E, Osindero S, Teh Y W. A fast learning algo- rithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554.
  • 7Ranzato M, Poultney C, Chopra S, LeCun Y. Efficient learn- ing of sparse representations with an energy-based model. In: Proceedings of the 2007 Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2007.
  • 8Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. In: Proceedings of the 2007 Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2007.
  • 9Erhan D, Manzagol P A, Bengio Y, Bengio S, Vincent P. The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Proceedings of the 12th Inter- national Conference on Artificial Intelligence and Statistics. Clearwater, Florida, USA: AISTATS, 2009. 153-160.
  • 10Glorot X, Bengio Y. Understanding the difficulty of train- ing deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: ICAIS, 2010.

共引文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部