重复囚徒困境的学习和响应模型被引量：2

The Learning and Response Model for Iterated Prisoner's Dilemma

下载PDF

导出

摘要囚徒困境问题是博弈论的一个重要范例,对此的研究涉及经济学、社会学、生物学等广泛领域。Axelrod R在文献[1]中从进化的角度研究和探讨了经典囚徒困境的一个扩展——重复囚徒困境。这种博弈要求参与者反复进行囚徒困境的博弈,并且可以记住他们的对抗历史。Axelrod还组织了两次重复囚徒困境的计算机竞赛,最终胜出的都是简单的"以牙还牙"策略[2]。这之后有不少学者试图找到可以击败它的策略,都未能取得显著成功。本文提出了一种学习和响应的理论模型,实际中的许多重复囚徒困境的策略都可以纳入这一模型中。我们分析了实现这一模型的难点和复杂度,同时给出了一种基于树结构的实现方式,并在实验中把它和"以牙还牙"作比较。实验以及分析表明,策略在竞赛中表现的优劣主要取决于如何利用一些启发式规则来权衡学习代价和博弈的总利益,以及在此基础上如何抽取对手的关键信息。 Being an important example in game theory, Prisoner＇s Dilemma （PD） has attracted widespread attention in a variety of disciplines such as economics, sociology and biology. Iterated Prisoner＇s Dilemma （IPD）was studied by Robert Axelrod in[1]to model the evolution of cooperation. In the game of IPD, two players repeatedly play PD and have the memory of the previous encounters. In the computer IPD tournaments organized by Axelrod, a simple strategy TFT （Tit for it） won twice. After that, a great mtmber of researchers sought to design new strategies to beat TFT in the tournament, but without much success. We propose a learning and response model for IPD which has many real-life strategies as its implementations. After analyzing the difficulty and complexity of implementing our model, we give a tree-based strategy and com- pare it with TFT in IPD toumaments~ Experimental results show that a strategy＇s behavior largely depends on its way to balance the trade-off between the learning cost and the overall payoff,and depends on how it investigates and utilizes its opponent＇s characteristics based on what it has learned.

作者宋亦泠王秉中朱洪蔡晟

机构地区复旦大学计算机科学与工程系南京大学计算机科学与技术系

出处《计算机工程与科学》 CSCD 2007年第10期115-119,共5页 Computer Engineering & Science

关键词囚徒困境重复囚徒困境博弈论学习和响应 prisoner＇ s dilemma iterated prisoner＇ s dilemma game theory learning and response

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献12

1Axelrod R M. The Evolution of Cooperation[M]. New York: Basic Books, 1984.
2Axelrod R M. The Evolution of Strategies in the Iterated Prisoner's Dilemma[A]. Davised Led. Genetic Algorithms and Simulated Annealing [M]. Morgan Kaufmann, 1989.
3Prisipner' s Dilemma[EB/OL]. http:// plato, stanford, edu/ entries/prisoner-dilemma/, 2006-02.
4Gilboa I. The Complexity of Computing Best-Response Automata in Repeated Games[J]. Journal of Economic Theory, 1988,45(2):342-352.
5Dasdan A, Irani S S, Gupta R K. Efficient Algorithms for Optimum Cycle Mean and Optimum Cost to Time Ratio Problems[A]. Proc of the 36th ACM/IEEE Conf on Design Automation Table of Contents[C]. 1999.
6Miller J H. The Coevolution of Automata in the Repeated Prisoner's Dilemma[J]. Journal of Economic Behavior & Organization, 1996,29(1) :87-112.
7Darwen P, Yao X. Co-Evolution in Iterated Prisoner's Dilemma with Intermediate Levels of Cooperation: Application to Missile Defense[J]. International Journal of Computational Intelligence and Applications, 2002,2 ( 1 ) : 83-107.
8Birk A. Evolution of Continuous Degrees of Cooperation in an N-Player Iterated Prisoner's Dilemma[M]. Kluwer Academic Publishers, 1999.
9Harrald P G, Fogel D B. Evolving Continuous Behaviors in the Iterated Prisoner's Dilemma[M]. Elsevier Science Ireland Ltd, 1996.
10Frean M. The Evolution of Degrees of Cooperation[M]. Academic Press Limited, 1996.

同被引文献5

1刘贞,任玉珑,唐松林.基于Mealy自动机的重复囚徒困境博弈模型[J].管理科学,2006,19(5):66-70. 被引量：6
2暴世宏.基于质量和价格的顾客购买决策模型[J].消费导刊,2010(4):7-10. 被引量：1
3廖列法,孙玮,刘朝阳.基于演化博弈研究移动和噪声对合作的影响[J].计算机应用与软件,2015,32(3):53-56. 被引量：4
4陈维春,尚丽辉.基于奖励因子的囚徒困境博弈模型研究[J].电子科技,2016,29(3):5-6. 被引量：6
5刘华,李莹,赵建立,葛美侠.沉默策略对囚徒困境博弈合作水平的影响[J].数学的实践与认识,2016,46(20):240-247. 被引量：3

引证文献2

1唐宸.基于多种奖励机制的囚徒困境博弈模型研究[J].科学家,2017,5(24):54-56.
2王雪.如何避免购买决策过程中的囚徒困境[J].丝路视野,2018,0(5):6-7.

1宋明鑫.“囚徒困境”博弈的计算机建模初探[J].信息系统工程,2012,25(4):40-41. 被引量：1
2张伊璇,何泾沙,赵斌,朱娜斐.一个基于博弈理论的隐私保护模型[J].计算机学报,2016,39(3):615-627. 被引量：25
3林天爱.捆绑电信免费促销廉价PC依旧走不出囚徒困境[J].IT时代周刊,2008(17):27-28.
4曹慧,刘玉峰.未标记样本在半监督学习中的应用方法研究[J].广西轻工业,2008,24(12):80-81. 被引量：1
5李群.建立学习型组织[J].合作经济与科技,2006(05X):4-5. 被引量：2
6柯克帕特里克,赵月.谁将赢得这场计算机竞赛[J].国外科技动态,1991(12):4-8.
7超级计算机竞赛不断升温[J].科技新时代,2005(7):41-41.
8威利尔夏,童维维.谁将赢得超级计算机竞赛[J].武汉自动化,1991(2):48-51.
9李勇为.美日超级计算机竞赛[J].国外科技动态,1989(12):18-19.
10潘一平.主体间性与囚徒困境[J].文教资料,2009(14):74-76. 被引量：1

计算机工程与科学

2007年第10期

浏览历史

内容加载中请稍等...

重复囚徒困境的学习和响应模型被引量：2

参考文献12

同被引文献5

引证文献2

相关作者

相关机构

相关主题

浏览历史

重复囚徒困境的学习和响应模型 被引量：2

参考文献12

同被引文献5

引证文献2

相关作者

相关机构

相关主题

浏览历史

重复囚徒困境的学习和响应模型被引量：2