期刊文献+

智能博弈对抗方法:博弈论与强化学习综合视角对比分析 被引量:13

Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning
下载PDF
导出
摘要 智能博弈对抗是人工智能认知决策领域亟待解决的前沿热点问题。以反事实后悔最小化算法为代表的博弈论方法和以虚拟自博弈算法为代表的强化学习方法,依托大规模算力支撑,在求解智能博弈策略中脱颖而出,但对两种范式之间的关联缺乏深入发掘。文中针对智能博弈对抗问题,定义智能博弈对抗的内涵与外延,梳理智能博弈对抗的发展历程,总结其中的关键挑战。从博弈论和强化学习两种视角出发,介绍智能博弈对抗模型、算法。多角度对比分析博弈理论和强化学习的优势与局限,归纳总结博弈理论与强化学习统一视角下的智能博弈对抗方法和策略求解框架,旨在为两种范式的结合提供方向,推动智能博弈技术前向发展,为迈向通用人工智能蓄力。 Adversarial intelligent game is an advanced research in decision-making problem of intelligence cognitive.With the support of large computing power,game theory and reinforcement learning represented by counterfactual regret minimization and fictitious self-play respectively,are state-of-the-art approaches in searching strategies.However,the relationship between these two paradigms is not entirely explored.For adversarial intelligent game problems,this paper defines the connotation and extension of adversarial intelligent game,studies the development history of adversarial intelligent game,and summarizes the key challenges.From the perspectives of game theory and reinforcement learning,the models and algorithms of intelligent game are introduced.This paper conducts a comparative study from game theory and reinforcement learning,including the methods and framework,the main purpose is to promote the advance of intelligent game,and lay a foundation for the development of general artificial intelligence.
作者 袁唯淋 罗俊仁 陆丽娜 陈佳星 张万鹏 陈璟 YUAN Wei-lin;LUO Jun-ren;LU Li-na;CHEN Jia-xing;ZHANG Wan-peng;CHEN Jing(College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073,China)
出处 《计算机科学》 CSCD 北大核心 2022年第8期191-204,共14页 Computer Science
基金 国家自然科学基金(61702528,61806212,62173336)。
关键词 智能博弈对抗 反事实后悔值最小化 虚拟自博弈 纳什均衡 强化学习 Adversarial intelligent game Counterfactual regret minimization Fictitious self-play Nash equilibrium Reinforcement learning
  • 相关文献

参考文献8

二级参考文献218

  • 1魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量:19
  • 2曾鹏,吴玲达,魏迎梅.战术计划识别模型的分析、描述与设计[J].计算机与数字工程,2006,34(9):1-4. 被引量:5
  • 3高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量:38
  • 4Von S H. Marktform und gleichgewicht[M]. New York: Springer, 1934.
  • 5Wang F Y. A coordination theory for intelligent machines[J] The IFAC Journal Automa-tica, 1990, 26(5): 55-60.
  • 6Conitzer V, Sandholm T. Computing the optimal strategy to commit to[C]//Proceedings of the 7th ACM Conference on Electronic Commerce (EC'06). Ann Arbor, Michigan, USA, 2006: 82-90.
  • 7Paruchuri P, Pearce J P. Playing games for security: an efficient exact algorithm for solving bayesian stackelberg games[C]//Proceedings of the 7th International Joint Con- ference on Autonomous Agents and Multiagent Systems (AAMAS'08). Estoril, Portugal, 2008: 895-902.
  • 8Pita J, Jain M. Using game theory for los angeles airport security[J]. AI Magazine, 2009, 30(1): 43-57.
  • 9Tsai J, Kiekintveld C. IRIS-a tool for strategic security allocation in transportation networks[C]// Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS'09). Budapest, Hungary, 2009: 37-44.
  • 10Shieh E, An B. Portect: A deployed game theoretic sys- tem to protect the ports of the United States[C]// Pro- ceedings of the llth International Conference on Au- tonomous Agents and Multiagent Systems(AAMAS'12). Va- lencia, Spain, 2012: 13-20.

共引文献698

同被引文献158

引证文献13

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部