基于强化学习的多机器人编队导航被引量：1

Multi-robots formation and navigation based reinforcement learning

下载PDF

导出

摘要针对多机器人系统在未知环境编队导航过程中遇到较长障碍物时,顺时针绕障和逆时针绕障的不同选择会给导航效率带来很大影响的问题,提出了一种三层强化学习方法。由高层的基于"条件-行为对"的在线学习适应环境障碍物的动态变化,中层采用角色交叉包含式控制结构保持队形,底层采用离线式常规强化学习机制获得避碰规则。仿真实验结果表明,由于只在高层保持在线学习,使学习空间得以缩小,学习时间得以缩短。该方法为复杂环境下的多机器人编队导航提供了一种有效的自主学习策略。 When multi-robot formation encounters long obstacles in unknown environment, the choice of clock-wise circumambulating or counter clock-wise circumambulating will greatly affect the efficiency of navigation. A kind of reinforcement learning with three levels is presented to solve this problem. The high level is based on be station-behavior pair to learn the circumambulating direction according to the dynamic variational obstacles. The middle level uses a Role-Cross-Subsumption control framework to keep the formation of the multi-robots. The lower level uses the off-line reinforcement learning. Simulation results show that the method can reduce the on-line learning space and speed up the learning rate.The method provides an effective autonomous learning strategy for multi-robot formation and navigation.

作者赵杰姜健臧希喆

机构地区哈尔滨工业大学机器人研究所

出处《辽宁工程技术大学学报（自然科学版）》 EI CAS 北大核心 2007年第6期915-918,共4页 Journal of Liaoning Technical University (Natural Science)

基金教育部长江学者和创新团队发展计划基金资助项目(IRT0423)

关键词多机器人系统强化学列编队导航角色 multi-robot systems reinforcement learning formation and navigation role

分类号 TP24 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献7

1张寒松,贾瑞清,王廷军.基于实际误差函数和隶属函数机器人避障算法[J].辽宁工程技术大学学报（自然科学版）,2006,25(4):588-591. 被引量：5
2孟江华,朱纪洪,孙增圻.未知环境下基于传感器的移动机器人路径规划新方法[J].机器人,2005,27(4):319-324. 被引量：19
3MAJA J. MATARIC . Reinforcement Learning in the Multi-Robot Domain[J]. Autonomous Robots. 1997,4(1):73-83.
4Barto A G. Mahanevan S . Recent advances in hierarchical reinforcement learning [J]. Discrete Event Dynamic Sysems Theory and Applications, 2003, 13(4): 41-77.
5Sutton R S,Precup D, Singh S P. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning [J]. Artificial Intelligence,1999,112(1): 181-211.
6Macek K, Petrovic I, Peric N. A reinforcement learning approach to obstacle avoidance of mobile robots[C].IEEE Press, Maribor,Slovenia. In 7th International Workshop on Advanced Motion Control.2002:462- 466.
7Werger B, Mataric M. Broadcast of local eligibility for multi-target observation[C]. Springer-Vedag Press, Tennessee.In the 5th International Symposium on Distributed Autonomous Robotic Systems, USA,2001:347-356.

二级参考文献15

1Latombe J C. Robot Motion Planning [ M ]. Boston, USA: Kluwer Academic Publishers, 1991.
2Oommen B J, Iyengar S S, Rao N S V, et al. Robot navigation in unknown terrain using learned visibility graphs[J]. IEEE Journal of Robotics and Automation, 1987, 3(6): 672 -681.
3Rao N S V. Algorithmic framework for learned robot navigation in unknown terrains[J]. Computer, 1989, 22(6): 37 -43.
4Ersson T, Hu X M. Path planning and navigation of mobile robots in unknown environments [ A ]. Proceedings of the IEEE International Conference on Intelligent Robots and Systems [ C ]. Maui, USA:IEEE, 2001 ,vol. 2. 858 -864.
5Khatib O. Real-time obstacle avoidance for manipulators and mobile robots[A]. Proceedings of the IEEE International Conference on Robotics and Automation[ C ]. USA: IEEE, 1985. 500 - 505.
6Arkin R C. Motor schema based navigation for a mobile robot: an approach for programming by behavior[ A]. Proceedings of the IEEE International Conference on Robotics and Automation [ C ]. USA:IEEE, 1987. 264-271.
7Reignier P. Moluse: an incremental approach of fuzzy learning[ A].Proceedings of the International Symposium on Intelligent Robotic Systems[C]. California, USA: 1994. 178-186.
8Lumelsky V J, Stepanov A A. Path-planning strategies for a point mobile automation moving amidst obstacles of arbitrary shape[ J]. Algorithmica, 1987, 2( 1): 403-430.
9Lumelsky V J, Mukhopadhyay S, Sun K. Dynamic path planning in seneor-based terrain acquisition [ J ]. IEEE Transactions on Robotics and Automation, 1990, 6(4): 462 -472.
10Robin R M. Introduction to AI Robotics[ M]. USA: Massachusetts Institute of Technology, 2000. 375 - 434.