Prediction Distortion in Monte Carlo Tree Search and an Improved Algorithm

Prediction Distortion in Monte Carlo Tree Search and an Improved Algorithm

下载PDF

导出

摘要 Teaching computer programs to play games through machine learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications. Monte Carlo Tree Search (MCTS) is one of the key AI techniques developed recently that enabled AlphaGo to defeat a legendary professional Go player. What makes MCTS particularly attractive is that it only understands the basic rules of the game and does not rely on expert-level knowledge. Researchers thus expect that MCTS can be applied to other complex AI problems where domain-specific expert-level knowledge is not yet available. So far there are very few analytic studies in the literature. In this paper, our goal is to develop analytic studies of MCTS to build a more fundamental understanding of the algorithms and their applicability in complex AI problems. We start with a simple version of MCTS, called random playout search (RPS), to play Tic-Tac-Toe, and find that RPS may fail to discover the correct moves even in a very simple game position of Tic-Tac-Toe. Both the probability analysis and simulation have confirmed our discovery. We continue our studies with the full version of MCTS to play Gomoku and find that while MCTS has shown great success in playing more sophisticated games like Go, it is not effective to address the problem of sudden death/win. The main reason that MCTS often fails to detect sudden death/win lies in the random playout search nature of MCTS, which leads to prediction distortion. Therefore, although MCTS in theory converges to the optimal minimax search, with real world computational resource constraints, MCTS has to rely on RPS as an important step in its search process, therefore suffering from the same fundamental prediction distortion problem as RPS does. By examining the detailed statistics of the scores in MCTS, we investigate a variety of scenarios where MCTS fails to detect sudden death/win. Finally, we propose an improved MCTS algorithm by incorporating minimax search to overcome prediction distortion. Our simulation has confirmed the effectiveness of the proposed algorithm. We provide an estimate of the additional computational costs of this new algorithm to detect sudden death/win and discuss heuristic strategies to further reduce the search complexity. Teaching computer programs to play games through machine learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications. Monte Carlo Tree Search (MCTS) is one of the key AI techniques developed recently that enabled AlphaGo to defeat a legendary professional Go player. What makes MCTS particularly attractive is that it only understands the basic rules of the game and does not rely on expert-level knowledge. Researchers thus expect that MCTS can be applied to other complex AI problems where domain-specific expert-level knowledge is not yet available. So far there are very few analytic studies in the literature. In this paper, our goal is to develop analytic studies of MCTS to build a more fundamental understanding of the algorithms and their applicability in complex AI problems. We start with a simple version of MCTS, called random playout search (RPS), to play Tic-Tac-Toe, and find that RPS may fail to discover the correct moves even in a very simple game position of Tic-Tac-Toe. Both the probability analysis and simulation have confirmed our discovery. We continue our studies with the full version of MCTS to play Gomoku and find that while MCTS has shown great success in playing more sophisticated games like Go, it is not effective to address the problem of sudden death/win. The main reason that MCTS often fails to detect sudden death/win lies in the random playout search nature of MCTS, which leads to prediction distortion. Therefore, although MCTS in theory converges to the optimal minimax search, with real world computational resource constraints, MCTS has to rely on RPS as an important step in its search process, therefore suffering from the same fundamental prediction distortion problem as RPS does. By examining the detailed statistics of the scores in MCTS, we investigate a variety of scenarios where MCTS fails to detect sudden death/win. Finally, we propose an improved MCTS algorithm by incorporating minimax search to overcome prediction distortion. Our simulation has confirmed the effectiveness of the proposed algorithm. We provide an estimate of the additional computational costs of this new algorithm to detect sudden death/win and discuss heuristic strategies to further reduce the search complexity.

作者 William Li

机构地区 Delbarton School

出处《Journal of Intelligent Learning Systems and Applications》 2018年第2期46-79,共34页 智能学习系统与应用（英文）

关键词 MONTE Carlo Tree SEARCH MINIMAX SEARCH BOARD GAMES Artificial Monte Carlo Tree Search Minimax Search Board Games Artificial

分类号 R73 [医药卫生—肿瘤]

引文网络
相关文献

1赵章明,冯径,施恩,舒晓村.带启发信息的蚁群神经网络训练算法[J].计算机科学,2017,44(11):284-288. 被引量：6
2Jeff Prosise,班迪.用Visual C++建立应用程序(五)[J].个人电脑,1997,3(10):189-190.
3Yuki Takaoka,Takashi Kawakami,Ryosuke Ooe.A Fundamental Study of a Computer Player Giving Fun to the Opponent[J].Journal of Computer and Communications,2018,6(1):32-41.
4Ali Alshehri,Anthony Hewins,Maria McCulley,Hani Alshahrani,Huirong Fu,Ye Zhu.Risks behind Device Information Permissions in Android OS[J].Communications and Network,2017,9(4):219-234.
5Marius Nagy,Naya Nagy.Quantum Tic-Tac-Toe: A Genuine Probabilistic Approach[J].Applied Mathematics,2012,3(11):1779-1786.
6N. N. Kozlov.Computation of the Genetic Code: Full Version[J].Journal of Computer and Communications,2017,5(10):78-94.
7Angelos Vourlidas.Lyα science from the LST aboard the ASO-S mission[J].Research in Astronomy and Astrophysics,2019,19(11):131-138. 被引量：4
8Suchit Kumar Rai,Sunil Kumar,Arvind Kumar Rai,Satyapriya &nbsp,Dana Ram Palsaniya.Climate Change, Variability and Rainfall Probability for Crop Planning in Few Districts of Central India[J].Atmospheric and Climate Sciences,2014,4(3):394-403.
9Hae-Won Kim,Karl Sohlberg.A Probabilistic Method of Characterizing Transit Times for Quantum Particles in Non-Stationary States[J].Journal of Modern Physics,2013,4(8):1080-1090.
10Sai Ho Ling,Hak Keung Lam.Playing Tic-Tac-Toe Using Genetic Neural Network with Double Transfer Functions[J].Journal of Intelligent Learning Systems and Applications,2011,3(1):37-44.

Journal of Intelligent Learning Systems and Applications

2018年第2期

浏览历史

内容加载中请稍等...

Prediction Distortion in Monte Carlo Tree Search and an Improved Algorithm

相关作者

相关机构

相关主题

浏览历史