期刊文献+

基于多智能体强化学习的新强化函数设计 被引量:4

A Reward Function Based on Reinforcement Learning of Multi-agent
下载PDF
导出
摘要 为了提高强化学习算法在多智能体系统中的性能表现,针对典型的多智能体系统-Keepaway平台总是以失败告终的特点,受与之有相同特点的单智能体系统杆平衡系统所采用强化函数的启发,重新设计一种新的惩罚式的强化函数。新的强化函数在系统成功状态时设零值奖赏,失败状态时给与负值惩罚。基于新设计的强化函数的Sarsa(λ)算法成功应用在Keepaway平台上。仿真结果表明,新设计的强化函数在一定参数条件下有效提高了强化学习算法载Keepaway平台的性能表现,其最终的学习效果更好。 To improve the performance of the reinforcement learning method on multi-agent systems, thinking about the characteristic of Keepaway that always ended with failure, based on the reference of the reward function design pattern in the pole-balance system, a new punitive reward function is redesigned. The values of the reward function are zeroes when the system is at successful states, and the values are negatives when the system is at failed states. Sarsa(λ) algorithm based on the new reward function are successfully used on the Keepaway. The simulation results show that the new reward function based on some parameters is better, and improves the performance of the reinforcement learning effectively.
出处 《控制工程》 CSCD 北大核心 2009年第2期239-242,共4页 Control Engineering of China
基金 北京市教委科技重点发展基金资助项目(EM200610005019) 北京工业大学博士科研启动基金资助项目(52002011200708)
关键词 Keepaway 多智能体系统 强化学习 强化函数 ROBOCUP Keepaway multi-agent system reinforcement learning reward function Robocup
  • 相关文献

参考文献7

  • 1Sutton R S,Barto A G. Reinforcement learning[ M]. Cambridge,MA: MIT Press, 1998.
  • 2Stone P, Sutton S R. Keepaway soccer: a machine learning testbed [ J ]. RoboCup-2001 :Robot Soccer World Cup V. 2002,2377:207- 237.
  • 3Stone P, Kuhlmann G, Taylor M E , et al. Keepaway soccer : from machine learning testbed to benchmark [ J ]. RoboCup-2005 : Robot Soccer World Cup IX ,2006,4020:93-105.
  • 4高阳.强化学习研究进展[M].∥机器学习及其应用.北京:清华大学出版社,2005:116-134.
  • 5Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: a survey[ J]. Joumal of Artificial Intelligence Research, 1996,4:237- 285.
  • 6Sutton R S. Generalization in reinforcement learning:successful examples using sparse coarse coding [ J ]. Advances in Neural Information Processing Systems, 1996,8 : 1038-1044.
  • 7Putennan M L. Markov decision problems[ M ]. NY, Wiley :1994.

同被引文献55

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部