A model-based approximate λ-policy iteration approach to online evasive path planning and the video game Ms.Pac-Man

A model-based approximate λ-policy iteration approach to online evasive path planning and the video game Ms.Pac-Man

导出

摘要 This paper presents a model-based approximate λ-policy iteration approach using temporal differences for optimizing paths online for a pursuit-evasion problem,where an agent must visit several target positions within a region of interest while simultaneously avoiding one or more actively pursuing adversaries.This method is relevant to applications,such as robotic path planning,mobile-sensor applications,and path exposure.The methodology described utilizes cell decomposition to construct a decision tree and implements a temporal difference-based approximate λ-policy iteration to combine online learning with prior knowledge through modeling to achieve the objectives of minimizing the risk of being caught by an adversary and maximizing a reward associated with visiting target locations.Online learning and frequent decision tree updates allow the algorithm to quickly adapt to unexpected movements by the adversaries or dynamic environments.The approach is illustrated through a modified version of the video game Ms.Pac-Man,which is shown to be a benchmark example of the pursuit-evasion problem.The results show that the approach presented in this paper outperforms several other methods as well as most human players. This paper presents a model-based approximate λ-policy iteration approach using temporal differences for optimizing paths online for a pursuit-evasion problem,where an agent must visit several target positions within a region of interest while simultaneously avoiding one or more actively pursuing adversaries.This method is relevant to applications,such as robotic path planning,mobile-sensor applications,and path exposure.The methodology described utilizes cell decomposition to construct a decision tree and implements a temporal difference-based approximate λ-policy iteration to combine online learning with prior knowledge through modeling to achieve the objectives of minimizing the risk of being caught by an adversary and maximizing a reward associated with visiting target locations.Online learning and frequent decision tree updates allow the algorithm to quickly adapt to unexpected movements by the adversaries or dynamic environments.The approach is illustrated through a modified version of the video game Ms.Pac-Man,which is shown to be a benchmark example of the pursuit-evasion problem.The results show that the approach presented in this paper outperforms several other methods as well as most human players.

作者 Greg FODERARO Vikram RAJU Silvia FERRARI

机构地区 Department of Mechanical Engineering and Materials Science

出处《控制理论与应用（英文版）》 EI 2011年第3期391-399,共9页

基金 supported by the National Science Foundation (No.ECS 0925407)

关键词 Approximate dynamic programming Reinforcement learning Path planning Pursuit evasion games Approximate dynamic programming Reinforcement learning Path planning Pursuit evasion games

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献18

1A. Nedi?,D. P. Bertsekas.Least Squares Policy Evaluation Algorithms with Linear Function Approximation[J]. Discrete Event Dynamic Systems . 2003 (1-2)
2Seapahn Megerian,Farinaz Koushanfar,Gang Qu,Giacomino Veltri,Miodrag Potkonjak.Exposure in Wireless Sensor Networks: Theory and Practical Solutions[J]. Wireless Networks . 2002 (5)
3K. Kedem,M. Sharir.An efficient motion-planning algorithm for a convex polygonal object in two-dimensional polygonal space[J]. Discrete & Computational Geometry . 1990 (1)
4V. Phipatanasuphorn,,P. Ramanathan.Vulnerability of sensor networks to unauthorized traversal and monitoring. IEEE Transactions on Computers . 2004
5S. Lucas.Evolving a neural network location evaluator to play ms. pacman. Proceedings of the 2005 IEEE Symposium on Computational Intelligence and Games . 2005
6P. Burrow,,S. Lucas.Evolution versus temporal difference learning for learning to play ms. pac-man. Proceedings of the 5th International Conference on Computational Intelligence and Games . 2009
7L. DeLooze,,W. Viner.Fuzzy Q-learning in a nondeterministic environment: developing an intelligent ms. pac-man agent. Proceedings of the 5th International Conference on Computational Intelligence and Games . 2009
8M. Emilio,,M. Moises,,R. Gustavo,et al.Pacman: Optimization based on ant colonies applied to developing an agent for ms. pac-man. Proceedings of IEEE Conference on Computational Intelligence and Games . 2010
9R. Thawonmas,,T. Ashida.Evolution strategy for optimizing parameters in ms. pacman controller ice pambrush 3. Proceedings of IEEE Conference on Computational Intelligence and Games . 2010
10D. P. Bertsekas,,S. Ioffe.Temporal Differences-based Policy Iteration and Applications in Neuro-dynamic Programming. Report LIDS-P- 2349 . 1996

1卓组长.Pac-Man——及时冒险[J].大众软件,2000(16):94-94.
2WEI QingLai,LIU DeRong.A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems[J].Science China Chemistry,2015,58(12):143-157. 被引量：8
3数码八卦[J].多媒体世界,2007(7):75-75.
4A LONG-AWAITED VISIT[J].Beijing Review,2014,57(27):3-5.
5嘉伟,周迪.补给站[J].新潮电子,2005(9):240-243.
6生存指南[J].游戏机实用技术,2013(2):68-69.
7刘春明,李兆斌,黄振华,左磊,吴军,徐昕.基于LSPI和滚动窗口的移动机器人反应式导航方法[J].中南大学学报（自然科学版）,2013,44(3):970-977. 被引量：7
8Dimitri P.BERTSEKAS.Approximate policy iteration:a survey and somenew methods[J].控制理论与应用（英文版）,2011,9(3):310-335. 被引量：6
9田传艳,王延红,陈再高,蒋阳.VisIt在数值模拟软件中的应用[J].计算机工程与科学,2009,31(A01):324-327. 被引量：1
10Haichun Sun,Yong Gao.Impact of an active educational video game on children's motivation,science knowledge, and physical activity[J].Journal of Sport and Health Science,2016,5(2):239-245. 被引量：3

控制理论与应用（英文版）

2011年第3期

浏览历史

内容加载中请稍等...

A model-based approximate λ-policy iteration approach to online evasive path planning and the video game Ms.Pac-Man

参考文献18

相关作者

相关机构

相关主题

浏览历史