期刊文献+

基于贝叶斯优化的强化学习广义不动点解逼近

Bayesian optimization-based generalized fixed point approximation
原文传递
导出
摘要 针对强化学习不动点的解更优这一问题,提出广义不动点解模型设计,该设计使用n步自举法的不动点解扩展和基于线性插值法的不动点解构造方法。将该设计应用于成熟的CBMPI算法框架上,提出基于广义不动点的CBMPI(n,β)算法。针对如何表达并逼近最优解这一问题,提出基于贝叶斯优化的广义不动点解的参数优化和基于集成学习的更高质量的解。在经典的10×10规模的Tetris游戏环境中验证算法提出的有效性。试验结果证明了基于线性插值法的广义不动点构造能比n步传统不动点效果好,其效果与其超参数步长n和插值参数β有很大关联。在100局的Tetris游戏中,平均分达到4 388.3,表明贝叶斯优化技术可以找到多组表现优异的策略。对表现优异的四组广义不动点的策略参数(贝叶斯优化技术的结果)进行策略集成和值函数集成,得到更高质量的解。平均分可以分别达到4 526.29和4 579.74,试验结果表明基于广义不动点的策略集成和基于广义不动点的值函数集成的分数相较于广义不动点的分数有小幅度提高,证实了可以通过集成学习寻找更高质量的解。 A generalized fixed-point solution model was proposed to address the question of what kind of reinforcement learning fixed-point solution was better.This design employed the extension of fixed-point solutions using n-step bootstrapping and constructed fixed-point solutions based on linear interpolation.This design was applied to the mature CBMPI algorithm framework,introducing the CBMPI(n,β)algorithm based on generalized fixed-points.Addressing the issue of expressing and approximating the optimal solution,optimization of parameters for generalized fixed-point solutions was proposed based on Bayesian optimization,and higher-quality solutions through ensemble learning were suggested.The effectiveness of the proposed algorithms was verified in the classical 10x10 Tetris game environment.Experimental results showed that the generalized fixed-point construction based on linear interpolation had outperformed the traditional n-step fixed-point method,and its performance was significantly associated with hyperparameters such as the step length n and interpolation parameterβ.Over 100 games of Tetris,an average score of 4388.3 was achieved,which indicated that Bayesian optimization techniques could identify multiple sets of outstanding strategies.By integrating strategies from four sets of outstanding generalized fixed-point parameters(results from Bayesian optimization techniques)and integrating value functions,higher-quality solutions were obtained.Average scores reached 4526.29 and 4579.74 respectively,which demonstrated that policy ensemble based on generalized fixed-points and value function ensemble based on generalized fixedpoints marginally improved scores compared to other generalized fixed-point policies.This confirmed the potential of ensemble learning to discover higher-quality solutions.
作者 陈兴国 吕咏洲 巩宇 陈耀雄 CHEN Xingguo;LÜ Yongzhou;GONG Yu;CHEN Yaoxiong(Jiangsu Key Laboratory of Big Data Security&Intelligent Processing,Nanjing University of Posts and Telecommunications,Nanjing 210023,Jiangsu,China;National Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210046,Jiangsu,China;Faculty of Electronic Information Engineering,Huaiyin Institute of Technology,Huaian,223003,Jiangsu,China)
出处 《山东大学学报(工学版)》 CAS CSCD 北大核心 2024年第4期21-34,共14页 Journal of Shandong University(Engineering Science)
基金 国家自然科学基金资助项目(62276142,62206133,62202240,62192783) 科技创新2030——“新一代人工智能”重大项目资助项目(2018AAA0100905) 江苏省产业前瞻与关键核心技术竞争资助项目(BE2021028) 深圳市中央引导地方科技发展资金资助项目(2021Szvup056)。
关键词 强化学习 值函数近似估计 不动点 贝叶斯优化 俄罗斯方块 reinforcement learning value function approximation fixed point Bayesian optimization Tetris
  • 相关文献

参考文献3

二级参考文献61

  • 1Zhang N L, Poole D. A simple approach to Bayesian network computations[C].Proceedings of the Tenth Canadian Con ference on Artifieial Intelligence, 1994 : 171 - 178.
  • 2Dechter R. Bucket elimination: a unifying framework for probabilistic inference[C].Proceedings of the Twelthth Confer ence on Uncertainty in Artificial Intelligence, Portland, Oregon, 1996: 211-219.
  • 3Kask K, Dechter R, Larrosa J, et al. Bucket-tree elimination for automated reasoning [J]. Artificial Intelligence, 2001 (125): 91-131.
  • 4Zhang N L, Poole D. Exploiting causal independence in Bayesian network inference[J]. Journal of Artificial Intelligence Research, 1996(5) : 301 - 328.
  • 5Amestoy P R, Davis T A, Du I S. An approximate minimum degree ordering algorithm[J]. AIAM Journal of Matrix Analysis and Aplications, 1996, 17(4) : 886 - 905.
  • 6Shachter R. Evidence absorption and propagation through evidence reversals [J]. Uncertainty in Artificial Intelligence, 1990(5): 173 - 190.
  • 7Adrian Y W C, Boutilier C. Structured Arc Reversal and Simulation of Dynamic Probabilistic Networks[C].Proceedings of the Thirteenth ConJerence on Uncertainty in AI (UAI-97), 1997.
  • 8Darwiche A. A differential approach to inference in Bayesian networks (Tech. Rep. Nos. D-108)[R]. Los Angeles: Computer Science Department, UCLA, 1999.
  • 9Darwiche A. A differential approach to inference in Bayesian networks[C].Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference, San Francisco:Morgan Kaufmann, 2000.
  • 10Boris B. An extension of the differential approach for Bayesian network inference to dynamic Bayesian networks[J]. International Journal of Intelligent Systems, 2004,19(8) : 727 - 748.

共引文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部