摘要
演化博弈论为解决社会困境提供了关键框架,并且不一定局限于统一时间尺度,同时强化学习已被证明是研究博弈论中策略更新动态和智能体学习过程的有效方法。因此,本文研究了时间尺度机制结合自我关注Q学习算法对空间囚徒困境博弈中合作的影响。具体来说,博弈交互和策略更新具有不同的时间尺度,时间尺度多样性影响策略的概率更新公式,并且将自我关注Q学习算法当作策略更新规则。数值结果表明,在这样的框架下,能够显著地促进合作。最后,分析了影响Q学习的参数以及在不同的初始设置下验证了机制的鲁棒性。
Evolutionary game theory provides a key framework for solving social dilemmas, and it is not necessarily limited to a unified time scale. At the same time, reinforcement learning has been proven to be an effective method to study the strategy update dynamics and agent learning process in game theory. Therefore, this paper studies the influence of time scale mechanism combined with self-focused Q-learning algorithm on cooperation in spatial prisoner’s dilemma game. Specifically, game interaction and strategy update have different time scales. The diversity of time scales affects the probability update formula of the strategy, and the self-focused Q-learning algorithm is used as the strategy update rule. The numerical results show that under such a framework, cooperation can be significantly promoted. Finally, the parameters affecting Q-learning are analyzed and the robustness of the mechanism is verified under different initial settings.
出处
《运筹与模糊学》
2024年第1期131-139,共9页
Operations Research and Fuzziology