期刊文献+

一种用于两人零和博弈对手适应的元策略演化学习算法 被引量:1

A Meta-evolutionary Learning Algorithm for Opponent Adaptation in Two-player Zero-sum Games
下载PDF
导出
摘要 围绕两人零和博弈所开展的一系列研究,近年来在围棋、德州扑克等问题中取得了里程碑式的突破.现有的两人零和博弈求解方案大多在理性对手的假设下围绕纳什均衡解开展,是一种力求不败的保守型策略,但在实际博弈中由于对手非理性等原因并不能保证收益最大化.对手建模为最大化博弈收益提供了一种新途径,但仍存在建模困难等问题.结合元学习的思想提出了一种能够快速适应对手策略的元策略演化学习求解框架.在训练阶段,首先通过种群演化的方法不断生成风格多样化的博弈对手作为训练数据,然后利用元策略更新方法来调整元模型的网络权重,使其获得快速适应的能力.在Leduc扑克、两人有限注德州扑克(Heads-up limit Texas Hold’em, LHE)和RoboSumo上的大量实验结果表明,该算法能够有效克服现有方法的弊端,实现针对未知风格对手的快速适应,从而为两人零和博弈收益最大化求解提供了一种新思路. Recently, two-player zero-sum games have made impressive breakthroughs in the Go and Texas Hold’em. Most of the existing two-player zero-sum game solutions are based on the assumption of rational opponents to approximate the Nash equilibrium solutions, which is a conservative strategy of trying to be undefeated but does not guarantee maximum payoffs in practice due to the opponents’ irrationality. The opponent modeling provides a new way to maximize the payoff, but modeling has difficulties. This paper proposes a meta-evolutionary learning framework that can quickly adapt to the opponents. In the training phase, we first generate opponents with different styles as training data through the population evolution method, and then use the meta-strategy update method to adjust the network weights of the meta-model so that it can gain the ability to adapt quickly. Extensive experiments on Leduc poker, heads-up limit Texas Hold’em(LHE), and RoboSumo have shown that the algorithm can effectively overcome the drawbacks of existing methods and achieve fast adaptation to unknown style of opponents, thus providing a new way of solving two-player zero-sum games with maximum payoff.
作者 吴哲 李凯 徐航 兴军亮 WU Zhe;LI Kai;XU Hang;XING Jun-Liang(Center for Research on Intelligent System and Engineering,Institute of Automation,Chinese Academy of Sciences,Beijing 100190;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049;Department of Computer Science and Technology,Tsinghua University,Beijing 100084)
出处 《自动化学报》 EI CAS CSCD 北大核心 2022年第10期2462-2473,共12页 Acta Automatica Sinica
基金 国家重点研发计划(2020AAA0103401) 国家自然科学基金(62076238,61902402) 中国科学院战略性先导研究项目(XDA27000000) CCF-腾讯犀牛鸟基金(RAGR20200104)资助。
关键词 两人零和博弈 纳什均衡 对手建模 元学习 种群演化 Two-player zero-sum games Nash equilibrium opponent modeling meta learning population evolution
  • 相关文献

参考文献6

二级参考文献198

  • 1李宪港,李强.典型智能博弈系统技术分析及指控系统智能化发展展望[J].智能科学与技术学报,2020,2(1):36-42. 被引量:22
  • 2LUCAS Simon,沈甜雨,王晓,张杰.基于统计前向规划算法的游戏通用人工智能[J].智能科学与技术学报,2019,0(3):219-227. 被引量:5
  • 3王飞跃.人工社会、计算实验、平行系统——关于复杂社会经济系统计算研究的讨论[J].复杂系统与复杂性科学,2004,1(4):25-35. 被引量:236
  • 4Werbos P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences [Ph.D. dissertation], Harvard University, USA, 1974.
  • 5Parker D B. Learning Logic, Technical Report TR-47, MIT Press, Cambridge, 1985.
  • 6LeCun Y. Une proc6dure d'apprentissage pour R6seau seuil assymatrique (a learning scheme for asymmetric threshold networks). In: Proceddings of the Cognitiva 85. Paris, France. 599-604 (in French).
  • 7Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533-536.
  • 8Bengio Y. Learning Deep Architectures for AI. Hanover MA: Now Publishers Inc. 2009.
  • 9Hinton G E, Osindero S, Teh Y W. A fast learning algo- rithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554.
  • 10Ranzato M, Poultney C, Chopra S, LeCun Y. Efficient learn- ing of sparse representations with an energy-based model. In: Proceedings of the 2007 Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2007.

共引文献256

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部