期刊文献+

多机器人系统强化学习研究综述 被引量:14

A Review of Developments in Reinforcement Learning for Multi-robot Systems
下载PDF
导出
摘要 强化学习是实现多机器人对复杂和不确定环境良好适应性的有效手段,是设计智能系统的核心技术之一.从强化学习的基本思想与理论框架出发,针对局部可观测性、计算复杂度和收敛性等方面的固有难题,围绕学习中的通信、策略协商、信度分配和可解释性等要点,总结了多机器人强化学习的研究进展和存在的问题;介绍了强化学习在机器人路径规划与避障、无人机、机器人足球和多机器人追逃问题中的应用;最后指出了定性强化学习、分形强化学习、信息融合的强化学习等若干多机器人强化学习的前沿方向和发展趋势. Reinforcement learning (RL) is an effective mean for multi-robot systems to adapt to complex and uncertain environments. It is considered as one of the key technologies in designing intelligent systems. Based on the basic ideas and theoretical framework of reinforcement learning, main challenges such as partial observation, computational complexity and convergence were focused. The state of the art and difficulties were summarized in terms of communication issues, cooperative learning, credit assignment and interpretability. Applications in path planning and obstacle avoidance, unmanned aerial vehicles, robot football, the multi-robot pursuit-evasion problem, etc., were introduced. Finally, the frontier technologies such as qualitative RL, fraetal RL and information fusion RL, were discussed to track its future development.
出处 《西南交通大学学报》 EI CSCD 北大核心 2014年第6期1032-1044,共13页 Journal of Southwest Jiaotong University
基金 国家自然科学基金资助项目(61075104)
关键词 多机器人系统 强化学习 马尔科夫决策过程 计算复杂度 不确定性 muki-robot systems reinforcement learning Markov decision process computational complexity;uncertainties
  • 相关文献

参考文献96

  • 1MURRAY R M,ASTROM K M,BODY S P,et al.Future directions in control in an information-rich world[J].IEEE Control Systems Magazine,2003,23 (2):20-23.
  • 2陈学松,杨宜民.强化学习研究综述[J].计算机应用研究,2010,27(8):2834-2838. 被引量:60
  • 3WIERING M,OTTERLO M V.Reinforcement learning state-of-the-art[M].Berlin:Springer-Verlag,2012:3-42.
  • 4SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning,1988,3(1):9-44.
  • 5CHEN Xingguo,GAO Yang,WANG Ruili.Online selective kernel-based temporal difference learning[J].IEEE Transactions on Neural Networks and Learning Systems,2013,24(12):1944-1956.
  • 6ZOU Bin,ZHANG Hai,XU Zongben.Learning from uniformly ergodic Markov chains[J].Journal of Complexity,2009,25(2):188-200.
  • 7YU Huizhen,BERTSEKAS D P.Convergence results for some temporal difference methods based on least squares[J].IEEE Transactions on Automatic Control,2009,54(7):1515-1531.
  • 8WATKINS C,DAYAN P.Q-learning[J].Machine Learning,1992,8(3):279-292.
  • 9沈晶,程晓北,刘海波,顾国昌,张国印.动态环境中的分层强化学习[J].控制理论与应用,2008,25(1):71-74. 被引量:5
  • 10王雪松,田西兰,程玉虎,易建强.基于协同最小二乘支持向量机的Q学习[J].自动化学报,2009,35(2):214-219. 被引量:20

二级参考文献430

共引文献402

同被引文献123

引证文献14

二级引证文献93

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部