期刊文献+

重复囚徒困境的学习和响应模型 被引量:2

The Learning and Response Model for Iterated Prisoner's Dilemma
下载PDF
导出
摘要 囚徒困境问题是博弈论的一个重要范例,对此的研究涉及经济学、社会学、生物学等广泛领域。Axelrod R在文献[1]中从进化的角度研究和探讨了经典囚徒困境的一个扩展——重复囚徒困境。这种博弈要求参与者反复进行囚徒困境的博弈,并且可以记住他们的对抗历史。Axelrod还组织了两次重复囚徒困境的计算机竞赛,最终胜出的都是简单的"以牙还牙"策略[2]。这之后有不少学者试图找到可以击败它的策略,都未能取得显著成功。本文提出了一种学习和响应的理论模型,实际中的许多重复囚徒困境的策略都可以纳入这一模型中。我们分析了实现这一模型的难点和复杂度,同时给出了一种基于树结构的实现方式,并在实验中把它和"以牙还牙"作比较。实验以及分析表明,策略在竞赛中表现的优劣主要取决于如何利用一些启发式规则来权衡学习代价和博弈的总利益,以及在此基础上如何抽取对手的关键信息。 Being an important example in game theory, Prisoner's Dilemma (PD) has attracted widespread attention in a variety of disciplines such as economics, sociology and biology. Iterated Prisoner's Dilemma (IPD)was studied by Robert Axelrod in[1]to model the evolution of cooperation. In the game of IPD, two players repeatedly play PD and have the memory of the previous encounters. In the computer IPD tournaments organized by Axelrod, a simple strategy TFT (Tit for it) won twice. After that, a great mtmber of researchers sought to design new strategies to beat TFT in the tournament, but without much success. We propose a learning and response model for IPD which has many real-life strategies as its implementations. After analyzing the difficulty and complexity of implementing our model, we give a tree-based strategy and com- pare it with TFT in IPD toumaments~ Experimental results show that a strategy's behavior largely depends on its way to balance the trade-off between the learning cost and the overall payoff,and depends on how it investigates and utilizes its opponent's characteristics based on what it has learned.
出处 《计算机工程与科学》 CSCD 2007年第10期115-119,共5页 Computer Engineering & Science
关键词 囚徒困境 重复囚徒困境 博弈论 学习和响应 prisoner' s dilemma iterated prisoner' s dilemma game theory learning and response
  • 相关文献

参考文献12

  • 1Axelrod R M. The Evolution of Cooperation[M]. New York: Basic Books, 1984.
  • 2Axelrod R M. The Evolution of Strategies in the Iterated Prisoner's Dilemma[A]. Davised Led. Genetic Algorithms and Simulated Annealing [M]. Morgan Kaufmann, 1989.
  • 3Prisipner' s Dilemma[EB/OL]. http:// plato, stanford, edu/ entries/prisoner-dilemma/, 2006-02.
  • 4Gilboa I. The Complexity of Computing Best-Response Automata in Repeated Games[J]. Journal of Economic Theory, 1988,45(2):342-352.
  • 5Dasdan A, Irani S S, Gupta R K. Efficient Algorithms for Optimum Cycle Mean and Optimum Cost to Time Ratio Problems[A]. Proc of the 36th ACM/IEEE Conf on Design Automation Table of Contents[C]. 1999.
  • 6Miller J H. The Coevolution of Automata in the Repeated Prisoner's Dilemma[J]. Journal of Economic Behavior & Organization, 1996,29(1) :87-112.
  • 7Darwen P, Yao X. Co-Evolution in Iterated Prisoner's Dilemma with Intermediate Levels of Cooperation: Application to Missile Defense[J]. International Journal of Computational Intelligence and Applications, 2002,2 ( 1 ) : 83-107.
  • 8Birk A. Evolution of Continuous Degrees of Cooperation in an N-Player Iterated Prisoner's Dilemma[M]. Kluwer Academic Publishers, 1999.
  • 9Harrald P G, Fogel D B. Evolving Continuous Behaviors in the Iterated Prisoner's Dilemma[M]. Elsevier Science Ireland Ltd, 1996.
  • 10Frean M. The Evolution of Degrees of Cooperation[M]. Academic Press Limited, 1996.

同被引文献5

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部