摘要
强化学习长期以来的一个目标是创造一个能够在具有挑战性的领域,以超越人类的精通程度学习的算法.基于蒙特卡洛树搜索与深度神经网络设计一种自学习智能五子棋算法,无需人类知识,从零开始学习.其中深度神经网络是由32个卷积层组成的深度残差网络;蒙特卡洛树搜索可根据多次模拟博弈的结果预测最优的移动方案.将五子棋规则与蒙特卡洛树搜索和深度神经网络相结合,蒙特卡洛树搜索使用深度神经网络评估落子位置和选择移动,增强树的搜索强度,提高落子质量,优化自对弈迭代.通过蒙特卡洛树搜索进行自对弈,训练一个神经网络来预测落子选择以及游戏的赢家.经过两天的训练,该算法的埃洛等级分已经达到4000分,远远高于普通人类水平.
Reinforcement learning has long had the goal of creating an algorithm that can learn in challenging areas beyond human mastery.Based on monte carlo search tree and deep neural network,a self-learning intelligent gobang algorithm is designed.The deep neural network is a deep residual network composed of 32 convolutional layers.The monte carlo search tree can predict the best moving scheme based on the results of multiple simulated games.With the combination of gobang rules with monte carlo search tree and deep neural network,monte carlo search tree uses deep neural network to evaluate the position and select the movement of the fallen pieces,enhance the search intensity of the tree,improve the quality of the fallen pieces,and optimize the iteration of self-playing.Playing chess through monte carlo search tree,training a neural network to predict the loser’s choice and the winner of the game.After two days of training,the algorithm’s Elo rating system has reached 4000,well above the average human level.
作者
李大舟
沈雪雁
高巍
张小明
孟智慧
LI Da-zhou;SHEN Xue-yan;GAO Wei;ZHANG Xiao-ming;MENG Zhi-hui(College of Computer Science and Technology,Shenyang University of Chemical Technology,Shenyang 110142,China;China Mobile Group Design Institute Co.LTD.Hebei Branch,Taiyuan 030000,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2020年第6期1169-1175,共7页
Journal of Chinese Computer Systems
基金
辽宁省教育厅一般科研项目(LQ2017008,L2016011)资助
辽宁省科技厅博士后启动项目(201601196)资助.