A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games 被引量：5

导出

摘要 Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games,e.g.,StarCraft and poker.Neural Fictitious Self-Play(NFSP)is an effective algorithm that learns approximate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge.However,it needs to train a neural network in an off-policy manner to approximate the action values.For games with large search spaces,the training may suffer from unnecessary exploration and sometimes fails to converge.In this paper,we propose a new Neural Fictitious Self-Play algorithm that combines Monte Carlo tree search with NFSP,called MC-NFSP,to improve the performance in real-time zero-sum imperfect-information games.With experiments and empirical analysis,we demonstrate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not.Furthermore,we develop an Asynchronous Neural Fictitious Self-Play framework(ANFSP).It uses asynchronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality.The experiments with th e games with hidden state information(Texas Hold^m),and the FPS(firstperson shooter)games demonstrate effectiveness of our algorithms.

作者 Li ZHANG Yuxuan CHEN Wei WANG Ziliang HAN Shijian Li Zhijie PAN Gang PAN

机构地区 College of Computer Science and Technology

出处《Frontiers of Computer Science》 SCIE EI CSCD 2021年第5期137-150,共14页 中国计算机科学前沿（英文版）

基金 National Key Research and Development Program of China(2017YFB1002503) Science and Technology Innovation 2030-“New Generation Artificial Intelligence”Major Project(2018AAA0100902),China.

关键词 approximate Nash Equilibrium imperfect-information games dynamic games Monte Carlo tree search Neural Fictitious Self-Play reinforcement learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献37

1李宪港,李强.典型智能博弈系统技术分析及指控系统智能化发展展望[J].智能科学与技术学报,2020,2(1):36-42. 被引量：21
2徐心和,邓志立,王骄,徐长明,刘纪红,马宗民.机器博弈研究面临的各种挑战[J].智能系统学报,2008,3(4):288-293. 被引量：40
3吴军,徐昕,王健,贺汉根.面向多机器人系统的增强学习研究进展综述[J].控制与决策,2011,26(11):1601-1610. 被引量：22
4胡裕靖,高阳,安波.不完美信息扩展式博弈中在线虚拟遗憾最小化[J].计算机研究与发展,2014,51(10):2160-2170. 被引量：8
5王元卓,于建业,邱雯,沈华伟,程学旗,林闯.网络群体行为的演化博弈模型与分析方法[J].计算机学报,2015,38(2):282-300. 被引量：62
6王震,袁勇,安波,李明楚,王飞跃.安全博弈论研究综述[J].指挥与控制学报,2015,1(2):121-149. 被引量：13
7陈兴国,俞扬.强化学习及其在电脑围棋中的应用[J].自动化学报,2016,42(5):685-695. 被引量：32
8王亚杰,邱虹坤,吴燕燕,李飞,杨周凤.计算机博弈的研究与发展[J].智能系统学报,2016,11(6):788-798. 被引量：30
9郭圣明,贺筱媛,胡晓峰,吴琳,欧微.军用信息系统智能化的挑战与趋势[J].控制理论与应用,2016,33(12):1562-1571. 被引量：33
10HUANG Changqiang,DONG Kangsheng,HUANG Hanqiao,TANG Shangqin,ZHANG Zhuoran.Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization[J].Journal of Systems Engineering and Electronics,2018,29(1):86-97. 被引量：58

引证文献5

1徐浩添,秦龙,曾俊杰,胡越,张琪.基于深度强化学习的对手建模方法研究综述[J].系统仿真学报,2023,35(4):671-694. 被引量：2
2罗俊仁,张万鹏,苏炯铭,魏婷婷,陈璟.计算机博弈中序贯不完美信息博弈求解研究进展[J].控制与决策,2023,38(10):2721-2748. 被引量：3
3陈英,王军,陈希亮,张启阳.不确定性下的博弈强化学习[J].计算机工程与设计,2023,44(11):3477-3488.
4罗俊仁,张万鹏,苏炯铭,袁唯淋,陈璟.多智能体博弈学习研究进展[J].系统工程与电子技术,2024,46(5):1628-1655. 被引量：1
5余超,刘宗凯,胡超豪,黄凯奇,张俊格.非完美信息博弈综述:对抗求解方法与对比分析[J].计算机学报,2024,47(9):2211-2246.

二级引证文献6

1张小川,严明珠,涂飞,陈俊宇,魏乐天.一种大众麻将计算机博弈的快速出牌方法[J].重庆理工大学学报（自然科学）,2024,38(5):102-107.
2郑彦辉.考虑非合作博弈的货运列车编组调度决策分析[J].铁道货运,2024,42(7):44-52.
3白成超,张琦,谢旭东,颜鹏,郭继峰.面向复杂决策的OODA环:智能赋能与认知增强[J].指挥与控制学报,2024,10(3):284-297.
4魏丽珍.AI智能体在社交网络数据分析中的应用与创新[J].互联网周刊,2024(16):21-23.
5Rui Jiang,Dong Ye,Yan Xiao,Zhaowei Sun,Zeming Zhang.Orbital Interception Pursuit Strategy for Random Evasion Using Deep Reinforcement Learning[J].Space(Science & Technology),2023,3(1):606-619. 被引量：3
6李智,孙怡峰,吴疆,王玉宾.基于战车和夺控点分配的智能体步兵投送策略决策方法[J].指挥与控制学报,2024,10(4):432-442.

1Shuhan Liang,Wenbin Lu,Rui Song.Deep advantage learning for optimal dynamic treatment regime[J].Statistical Theory and Related Fields,2018,2(1):80-88.
2Ali T. Al-Mishwat.Barrellite and Pillarrite: A Description and a Mode of Formation of a Novel Post-Sedimentary Twin Structures from As-Subbiyah, North of Kuwait Bay, Kuwait[J].International Journal of Geosciences,2021,12(7):625-634.
3Zerui CHEN,Youliang TIAN,Changgen PENG.An incentive-compatible rational secret sharing scheme using blockchain and smart contract[J].Science China(Information Sciences),2021,64(10):217-237. 被引量：2
4Zhonghong Ou,Wenjun Chai,Lifei Wang,Ruru Zhang,Jiawen He,Meina Song,Lifei Yuan,Shengjuan Zhang,Yanhui Wang,Huan Li,Xin Jia,Rujian Huang.M^(2)LC-Net: A Multi-Modal Multi-Disease Long-Tailed Classification Network for Real Clinical Scenes[J].China Communications,2021,18(9):210-220.
5Peng YANG,Qi YANG,Ke TANG,Xin YAO.Parallel exploration via negatively correlated search[J].Frontiers of Computer Science,2021,15(5):123-135. 被引量：3
6Vincent Chan,Kam-Wah Tsui,Yanran Wei,Zhiyang Zhang,Xinwei Deng.Efficient estimation of smoothing spline with exact shape constraints[J].Statistical Theory and Related Fields,2021,5(1):55-69.
7Anthony Joe Turkson,John Awuah Addor,Douglas Yenwon Kharib.Validating Intrinsic Factors Informing E-Commerce: Categorical Data Analysis Demo[J].Open Journal of Statistics,2021,11(5):737-758.
8Pedro Neves,Mário Ornelas,Inês Matias,João Rodrigues,Margarida Santos,Marco DutraMedeiros,David Martins.Dexamethasone intravitreal implant(Ozurdex) in diabetic macular edema: real-world data versus clinical trials outcomes[J].International Journal of Ophthalmology(English edition),2021,14(10):1571-1580. 被引量：2
9Zhang Meichang.On the Legal Attributes of Digital Currency[J].Social Sciences in China,2021,42(2):123-141.
10武文浩,郭莉,张志,宋江锋,陈长安,王广西.液态锂铅合金中氢同位素测量研究进展[J].材料导报,2021,35(19):19125-19133.

Frontiers of Computer Science

2021年第5期

浏览历史

内容加载中请稍等...

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games 被引量：5

同被引文献37

引证文献5

二级引证文献6

相关作者

相关机构

相关主题

浏览历史