摘要
基于深度强化学习的计算机博弈程序(如AlphaGo)已在围棋上战胜了人类世界冠军。这些算法利用可学习的价值神经网络和策略神经网络指导蒙特卡洛树的探索。为提高蒙特卡洛树的搜索性能,已提出多种改进方法,其中置换表已被证明可提高搜索效率。在此基础上,提出一种新的基于置换表的方法——基于深度强化学习的双置换表优化算法。该方法使用不同的替换策略管理双层置换表,并将六子棋的两步落子解耦为2个独立的神经网络。这不仅减小了动作空间规模,也更易于神经网络训练。以六子棋为例进行的实验结果表明,在有限的计算资源下,该方法能显著提升棋局哈希命中率和程序棋力水平。
Computer game programs based on deep reinforcement learning,such as AlphaGo,have beaten human world champions in the game of Go.These algorithms utilize learnable value neural networks and policy neural networks to guide the exploration process of Monte Carlo Tree Search.Various enhancement methods have been proposed to improve the search performance of Monte Carlo trees,among which the transposition table has been proven to enhance search efficiency.Building upon this foundation,this paper introduces a novel method,the two-level transposition table optimization algorithm based on deep reinforcement learning.This method manages two level transposition tables using distinct replacement strategies and decouples the two-step moves of Connect6 into two independent neural networks.This not only reduces the scale of the action space but also simplifies neural network training.Our experimental results using Connect6 as an example demonstrate this approach significantly enhances the program’s playing strength under limited computational resources.
作者
王栋年
王军伟
薛世超
汪超
徐长明
WANG Dongnian;WANG Junwei;XUE Shichao;WANG Chao;XU Changming(Graduate,Northeastern University,Qinhuangdao 066004,China;School of Computer and Communication Engineering,Northeastern University at Qinhuangdao,Qinhuangdao 066004,China)
出处
《重庆理工大学学报(自然科学)》
CAS
北大核心
2024年第5期145-153,共9页
Journal of Chongqing University of Technology:Natural Science
基金
河北省自然科学基金面上项目(F2023501006)。