期刊文献+

一种最大集合期望损失的多目标Sarsa(λ)算法 被引量:3

A Multiple-Goal Sarsa(λ) Algorithm Basedon Lost Reward of Greatest Mass
下载PDF
导出
摘要 针对RoboCup这一典型的多目标强化学习问题,提出一种基于最大集合期望损失的多目标强化学习算法LRGM-Sarsa(λ)算法.该算法预估各个目标的最大集合期望损失,在平衡各个目标的前提下选择最佳联合动作以产生最优联合策略.在单个目标训练的过程中,采用基于改进MSBR误差函数的Sarsa(λ)算法,并对动作选择概率函数和步长参数进行优化,解决了强化学习在使用非线性函数泛化时,算法不稳定、不收敛的问题.将该算法应用到RoboCup射门局部策略训练中,取得了较好的效果,表明该学习算法的有效性. For solving the multiple-goal problem in RoboCup,a novel multiple-goal Reinforcement Learning algorithm,named LRGM-Sarsa(λ),is proposed.The algorithm estimates the lost reward of the greatest mass of every sub goal and trades off the long term reward of the sub goals to get a composite policy.In the single learning module,B error function,which is based on MSBR error function is proposed.B error function has guaranteed the convergence of the value prediction with the non-linear function approximation.The probability funciton of selecting actions and the parameter α are also improved with respect to B error function.This algorithm is applied to the training of shooting in Robocup 2D.The experimental results show that the proposed algorithm is more stable and converges faster.
出处 《电子学报》 EI CAS CSCD 北大核心 2013年第8期1469-1473,共5页 Acta Electronica Sinica
基金 国家自然科学基金(No.61070223 No.61103045 No.61272005 No.61170020) 江苏省自然科学基金(No.BK2012616) 江苏省高校自然科学研究项目(No.09KJA520002 No.09KJB520012) 吉林大学符号计算与知识工程教育部重点实验室项目(No.93K172012K04)
关键词 多目标 自适应Sarsa(λ) 最大集合期望损失 强化学习 机器人足球 multiple-goal adaptive Sarsa(λ) lost reward of greatest mass reinforcement learning robocup 2D
  • 相关文献

参考文献16

  • 1Fong Wu, Shlomo 2hilberst~in, Xiaoping Chen. Online planning for mulfi-agont systems with bounded communication[ J]. Arfif- ical Intelligence, 2011,175(2) :487 - 511.
  • 2zha~, Xiaoping Chert. hc~lerating point-has~ POMDP algorithms via greedy sWategies [ A]. Proceedings of the 2nxi International Conference on Simulalion,Modding and Programming for Autonorat~ Robots [ C ]. Danmtadt, Ger- many:Computer Science,2010.545- 556.
  • 3Feng Wu, Shlo~ao Zilberstein, XiaoPing Chert. Multi-agent on- line planning with communication[ A]. Proceedings of ICAPS- 09[C]. Ihessaloniki, Greece: AAAI, 2009.
  • 4Ctmstanlin A,Dana H. Q assignment in multiple goal em- bodied visu~tlotor behavior[ J]. Frontiers in Psychology, 2010, 173(1) :1 - 13.
  • 5Marek Grzes, Daniel Kuadenko. Multigrid reinf~r.enmat learn- ing with reward shaping[A] .Proceedings of ICANN 2008[C]. Verlag, Berlin: Springer, 2008.357- 366.
  • 6王雪松,张依阳,程玉虎.基于高斯过程分类器的连续空间强化学习[J].电子学报,2009,37(6):1153-1158. 被引量:11
  • 7Karlsson J. Learning to Solve Multiple Goals[ D]. Rochester:University of Rochester, 1997.
  • 8Humphrys M. Action selection meflxxts using t~foteeng~t learning[ A]. Proceedings of the Fourth International Confer- ence on Simulation of Adaptive Behavior [ C ]. Cambridge, MA:MIT, 1996.135 - 144.
  • 9Daw N, Doherty J, et al. Cortical substrates for exploratory de- risions in humans[J]. Nanae,2006,441 (7095) :876 - 879.
  • 10Rangel A, Hare T. Neural cxantm~ons associated with goal- dir~ted choice[J]. Oment opinion in neurobiology,2010,20 (6) :262- 270.

二级参考文献22

  • 1秦斌,吴敏,王欣,阳春华.基于多智能体强化学习的焦炉集气管压力多级协调控制[J].电子学报,2006,34(10):1847-1851. 被引量:3
  • 2V Vapnik. The Nature of Statistical Learning Theory[M]. New York:Springer Verlag, 1995.
  • 3R Goto, H Matsuo. State generalization method with support vector machines in reinforcement learning [ J ]. Systems and Computers in Japan,2006,37(9):77 - 86.
  • 4Xuesong Wang, Xilan Tian, Yuhu Cheng. Value approximation with least squares support vector machine in reinforcement learning system[ J ]. Joturnal of Computational and Theoretical Nanoscience, 2007,4( 7/8 ) : 1290 - 1294.
  • 5G Kai, H Barbara. A reinforcement learning algorithm to improve scheduling search heuristics with the SVM[ A]. Proceedings of the IEEE International Conference on Neural Networks [C]. Piscataway: IEEE, Inc,2004.1811 - 1816.
  • 6R Manimegalai,E S Soumya, V Muralidharan, et al.Placement and muting for 3D-FPGAs using reinforcement learning and support vector machines [ A ]. Proceedings of the International Conference on VLSI Design[ C]. Piscataway: IEEE Inc, 2005. 451 - 456.
  • 7C K I Williams, D Barber. Bayesian classification with Gaussian processes [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(12) : 1342 - 1351.
  • 8M Kyriakos, P Dimitris. Continuous nearest neighbor queries over sliding windows [ J ]. IEEE Transactions on Knowledge and Data Engineering,2007,19(6) :789 - 803.
  • 9C E Rasmussen, C K I Williams. G-aussian Processes for Machine Learning[M] .USA:MIT Press,2006.
  • 10L Jouffe. Fuzzy inference system learning by reinforcement methods[J]. IEEE Transactions on System, Man and Cybernetics, 1998,28(3) :338 - 355.

共引文献22

同被引文献119

引证文献3

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部