摘要
传统博弈理论模型建立在人的完全理性基础之上,难以切合实际。有限理性博弈则能够很好地描述实际问题。有限理性的博弈者参与到不完全信息博弈中,对博弈的规则、结构以及对手等博弈信息有一个逐渐适应和了解的过程,因此博弈应是动态进化的模型。针对这一问题,提出了一种基于Q学习算法的不完全信息博弈模型,根据Littman的最大最小原则建立了多指标体系下的策略选择概率分布;构建了Q学习与博弈融合的数学模型,使用Q学习机制来实现博弈模型的动态进化;最后将模型应用于两人追逐的仿真实验,结果表明所提出的模型能够很好地再现追逐情景。
The conventional game theory model is constructed based on human's perfect rational,which does not accord with the reality,while limited rational game can describe the real problems. The limited rational players participated in the imperfect information game will gradually learn and adapt to the game information such as the game rules,game model structure and opponent,etc. Thus the game is a dynamic evolutional model. Aiming to this problem,an imperfect information game model based on Q-learning algorithm is proposed. The strategy choice probability distributions of multiple indexes are calculated according to the Littman Max-min principle,and the mathematic model of game combined with Qlearning is constructed,in which the dynamic evolution of game model is implemented through Qlearning algorithm. Finally,the proposed model is applied to the pursuing simulation test. Simulation results show that the proposed model can excellently reflect the pursuing situation.
出处
《系统仿真技术》
2014年第3期203-210,共8页
System Simulation Technology
基金
浙江省自然科学基金资助项目(LY14F020036)
台州学院青年基金资助项目(2012QN09)
关键词
Q学习
有限理性博弈
追逐
多指标收益
Q-learning
limited rational game
pursuing
multiple indexes pay-off