期刊文献+

结合A2C和手牌估值方法的麻将博弈研究

Research on mahjong game combining A2C with hand value evaluation method
下载PDF
导出
摘要 针对大众麻将中对手牌信息利用不充分的问题,提出了手牌估值方法,并设计了基础麻将程序(MJE)。为进一步提升麻将AI的博弈能力,使用深度强化学习方法设计了麻将AI(MJE-RL)。首先,通过MJE自对弈生成深度学习的训练数据。其次,根据训练集、测试集和对比实验的结果,选择效果最好的模型作为强化学习的预训练模型。最后,使用优势演说-评论家模型作为强化学习的主要框架,将训练好的深度学习模型作为演说家部分进行决策,通过MJE-RL与MJE的对弈不断提升麻将AI的博弈能力。实验结果显示,MJE-RL的胜率比MJE高4.08%,点炮率比MJE低3.02%,表明MJE-RL在攻守两端都有提升,达到了提升麻将AI牌力的目的。 To address the underutilizing hand information in popular mahjong,this paper designs a hand valuation method and a basic mahjong program(MJE).Mahjong AI(MJE-RL)is designed by using the deep reinforcement learning approach to further improve its gaming ability.First,the training data of deep learning is generated by MJE’s self-play.Second,the best model is selected as the pre-training model of reinforcement learning,according to the results of training set,test set and comparison experiment.Finally,the Advantage Actor-Critic(A2C)model is employed as the main framework of reinforcement learning.The well-trained deep learning model is used as the Actor to make decisions,and the game ability of mahjong AI is constantly improved by playing between MJE-RL and MJE.Our experimental results indicate the winning rate of MJE-RL is 4.08%higher than that of MJE and the rate of Win by Discard is 3.02%lower than that of MJE.Meanwhile,it is shown that MJE-RL markedly improves both offensive and defensive fronts,demonstrating improved overall strength of mahjong AI.
作者 衣御寒 王亚杰 吴燕燕 刘松 张兴慧 蒋传禹 YI Yuhan;WANG Yajie;WU Yanyan;LIU Song;ZHANG Xinghui;JIANG Chuanyu(Engineering Training Center,Shenyang Aerospace University,Shenyang 110136,China)
出处 《重庆理工大学学报(自然科学)》 CAS 北大核心 2024年第5期154-161,共8页 Journal of Chongqing University of Technology:Natural Science
基金 辽宁省兴辽英才计划项目(XLYC1906003)。
关键词 麻将 非完备信息 深度强化学习 A2C popular mahjong incomplete information deep reinforcement learning A2C
  • 相关文献

参考文献10

二级参考文献67

共引文献84

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部