期刊文献+

基于深度强化学习的掼蛋扑克博弈求解

Solving GuanDan Poker Games with Deep Reinforcement Learning
下载PDF
导出
摘要 在不确定信息的复杂环境下进行决策是现实中人们经常面对的困难之一,因此具有能够进行良好决策的能力被视为人工智能的重要能力之一.而游戏类型的博弈作为对现实世界的一种高度抽象,具有良定义、易检验算法优劣等特点,成为研究的主流.其中以掼蛋为代表的扑克类博弈不仅具有他人手牌未知这样的难点,还由于可选出牌动作与他人手牌情况数量庞大等特点,难以进行高效求解.因此,提出了一种软深度蒙特卡洛(soft deep Monte Carlo,SDMC)求解方法.该方法能够更好地融合领域知识,加快策略学习速度,并采用软动作采样策略调整实时决策,提升策略胜率.所提出的SDMC方法训练出的策略模型参加第2届“中国人工智能博弈算法大赛”时获得冠军.与第1届比赛冠军策略和第2届其他策略模型的实验对比证明了该方法在解决掼蛋扑克博弈中的有效性. Decisions are often made in complex environment without exact information in many real-world occasions.Hence the capability of making proper decisions is expected for artificial intelligence agents.As abstractions of the real world,games provoke interests of researchers with the benefits of well-defined game structure and the facility to evaluate various algorithms.Among these games,GuanDan poker games are typical games with large action space and huge information set size,which exacerbates the problem and increases the difficulty to solve these games.In this work,we propose a novel soft deep Monte Carlo(SDMC)method to overcome the above-mentioned difficulties.By considering how the expert strategy acts in the training process,SDMC can better utilize the expert knowledge and accelerate the convergence of training process.Meanwhile,SDMC applies an action sample strategy in real time playing to confuse the opponents and prohibits the potentional exploitation of them,which could also lead to significant improvement of the performance against different agents.SDMC agent was the champion of the 2nd Chinese Artificial Intelligence Game Algorithm competition.Comprehensive experiments that evaluate the training time and final performance are conducted in this work,showing superior performance of SDMC against other agents such as the champion of 1st competition.
作者 葛振兴 向帅 田品卓 高阳 Ge Zhenxing;Xiang Shuai;Tian Pinzhuo;Gao Yang(National Key Laboratory for Novel Software Technology(Nanjing University),Nanjing 210023;School of Computer Engineering and Science,Shanghai University,Shanghai 200444;Shenzhen Research Institute of Nanjing University,Shenzhen,Guangdong 518057)
出处 《计算机研究与发展》 EI CSCD 北大核心 2024年第1期145-155,共11页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2018AAA0100905) 国家自然科学基金项目(62192783,62276142,62206166) 江苏省产业前瞻与关键核心技术竞争项目(BE2021028) 深圳市中央引导地方科技发展资金项目(2021Szvup056) 上海市扬帆计划项目(23YF1413000)。
关键词 非完美信息 深度强化学习 多智能体系统 软深度蒙特卡洛方法 扑克博弈 imperfect information deep reinforcement learning multi-agent system soft deep Monte Carlo method poker game
  • 相关文献

参考文献2

二级参考文献3

共引文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部