摘要
本文使用蒙特卡洛树搜索(MCTS)算法代替传统Alpha-Beta搜索算法,采用改变其他对称方面来训练非对称情况下的策略,研究对比和分析,认为确实可以找到一种通用的自我强化学习方法。
The reinforcement learning is applied to simulate the robot motion control,the introduction of PyGame virtual environment,the Q function of neural network training model,proposed the robot combat reinforcement learning model,the model of input for pixel(picture),the output as value function,to carry out the training model in the predefined environment,cIn this paper,the reinforcement learning is applied to simulate the robot motion control,the introduction of PyGame virtual environment,the Q function of neural network training model,proposed the robot combat reinforcement learning model,the model of input for pixel(picture),the output as value function,to carry out the training model in the predefined environment,core model is the application of neural network in the depth of Q learning using DQN algorithm simulation the function of the decision-making process in addition to,also in the process of training application and comparison between the method Actor-critic algorithm by comparing the difference between the two model output,carried on the thorough discussion and research In this paper,monte Carlo tree search(MCTS)algorithm is used to replace the traditional alpha-beta search algorithm,and other symmetric aspects are changed to train the asymmetric strategy.Through research comparison and analysis,it is shown that a general self-reinforcing learning method can indeed be foundore model is the application of neural network in the depth of Q learning using DQN algorithm simulation the function of the decision-making process in addition to,also in the process of training application and comparison between the method Actor-critic algorithm by comparing the difference between the two model output,carried on the thorough discussion and research.In this paper,monte Carlo tree search(MCTS)algorithm is used to replace the traditional alpha-beta search algorithm,and other symmetric aspects are changed to train the asymmetric strategy.Through research comparison and analysis,it is shown that a general self-reinforcing learning method can indeed be found.
作者
陈明阳
刘博
茆意风
Chen Mingyang;Liu Bo;Mao Yifeng(University of Pennsylvania,Pennsylvania 19019)
出处
《中阿科技论坛(中英文)》
2020年第9期170-173,共4页
China-Arab States Science and Technology Forum