Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning

Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning

下载PDF

导出

摘要 A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: 1) Learning by self-play, 2) Learning by playing against an expert program, and 3) Learning from viewing ex-perts play against each other. Although the third possibility generates high-quality games from the start compared to initial random games generated by self-play, the drawback is that the learning program is never allowed to test moves which it prefers. Since our expert program uses a similar evaluation function as the learning program, we also examine whether it is helpful to learn directly from the board evaluations given by the expert. We compared these methods using temporal difference methods with neural networks to learn the game of backgammon. A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: 1) Learning by self-play, 2) Learning by playing against an expert program, and 3) Learning from viewing ex-perts play against each other. Although the third possibility generates high-quality games from the start compared to initial random games generated by self-play, the drawback is that the learning program is never allowed to test moves which it prefers. Since our expert program uses a similar evaluation function as the learning program, we also examine whether it is helpful to learn directly from the board evaluations given by the expert. We compared these methods using temporal difference methods with neural networks to learn the game of backgammon.

作者 Marco A. Wiering

机构地区不详

出处《Journal of Intelligent Learning Systems and Applications》 2010年第2期57-68,共12页 智能学习系统与应用（英文）

关键词 Board GAMES Reinforcement LEARNING TD(λ) Self-Play LEARNING From Demonstration Board Games Reinforcement Learning TD(λ) Self-Play Learning From Demonstration

分类号 R73 [医药卫生—肿瘤]

引文网络
相关文献

1Yuya Mukasa,Kayoko Yamamoto.A Sightseeing Spot Recommendation System for Urban Smart Tourism Based on Users’ Priority Conditions[J].Journal of Civil Engineering and Architecture,2019,13(10):622-640. 被引量：1
2Omid BUSHEHRIAN.Applying Heuristic Search for Distributed Software Performance Enhancement[J].Journal of Software Engineering and Applications,2009,2(3):144-149.
3Xiaogang Ruan,Jing Chen,Lizhen Dai.Motor Learning Based on the Cooperation of Cerebellum and Basal Ganglia for a Self-Balancing Two-Wheeled Robot[J].Intelligent Control and Automation,2011,2(3):214-225.
4Xiyu Kang,Yiqi Wang,Yanrui Hu.Research on Different Heuristics for Minimax Algorithm Insight from Connect-4 Game[J].Journal of Intelligent Learning Systems and Applications,2019,11(2):15-31. 被引量：2
5Bing Qiao,Feng Xiao,Ru Lan,Tao Li,Chunchao Li.Oil Spill Model Operational Application in Damage Assessment and Case Study of Validation[J].Journal of Environmental Protection,2018,9(5):525-539.

Journal of Intelligent Learning Systems and Applications

2010年第2期

浏览历史

内容加载中请稍等...

Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning

相关作者

相关机构

相关主题

浏览历史