期刊文献+

An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game 被引量:8

An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game
下载PDF
导出
摘要 The iterated prisoner's dilemma(IPD) is an ideal model for analyzing interactions between agents in complex networks. It has attracted wide interest in the development of novel strategies since the success of tit-for-tat in Axelrod's tournament. This paper studies a new adaptive strategy of IPD in different complex networks, where agents can learn and adapt their strategies through reinforcement learning method. A temporal difference learning method is applied for designing the adaptive strategy to optimize the decision making process of the agents. Previous studies indicated that mutual cooperation is hard to emerge in the IPD. Therefore, three examples which based on square lattice network and scale-free network are provided to show two features of the adaptive strategy. First, the mutual cooperation can be achieved by the group with adaptive agents under scale-free network, and once evolution has converged mutual cooperation, it is unlikely to shift. Secondly, the adaptive strategy can earn a better payoff compared with other strategies in the square network. The analytical properties are discussed for verifying evolutionary stability of the adaptive strategy. The iterated prisoner's dilemma(IPD) is an ideal model for analyzing interactions between agents in complex networks. It has attracted wide interest in the development of novel strategies since the success of tit-for-tat in Axelrod's tournament. This paper studies a new adaptive strategy of IPD in different complex networks, where agents can learn and adapt their strategies through reinforcement learning method. A temporal difference learning method is applied for designing the adaptive strategy to optimize the decision making process of the agents. Previous studies indicated that mutual cooperation is hard to emerge in the IPD. Therefore, three examples which based on square lattice network and scale-free network are provided to show two features of the adaptive strategy. First, the mutual cooperation can be achieved by the group with adaptive agents under scale-free network, and once evolution has converged mutual cooperation, it is unlikely to shift. Secondly, the adaptive strategy can earn a better payoff compared with other strategies in the square network. The analytical properties are discussed for verifying evolutionary stability of the adaptive strategy.
出处 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2018年第1期301-310,共10页 自动化学报(英文版)
基金 supported by the National Natural Science Foundation(NNSF)of China(61603196,61503079,61520106009,61533008) the Natural Science Foundation of Jiangsu Province of China(BK20150851) China Postdoctoral Science Foundation(2015M581842) Jiangsu Postdoctoral Science Foundation(1601259C) Nanjing University of Posts and Telecommunications Science Foundation(NUPTSF)(NY215011) Priority Academic Program Development of Jiangsu Higher Education Institutions,the open fund of Key Laboratory of Measurement and Control of Complex Systems of Engineering,Ministry of Education(MCCSE2015B02) the Research Innovation Program for College Graduates of Jiangsu Province(CXLX1309)
关键词 Complex network prisoner’s dilemma reinforcement learning temporal differences learning Complex network prisoner's dilemma reinforcement learning temporal differences learning
  • 相关文献

同被引文献37

引证文献8

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部