期刊文献+

一种基于Q学习的有限理性博弈模型及其应用 被引量:2

A Limited Rational Game Model Based on Q-learning and Its Application
下载PDF
导出
摘要 传统博弈理论模型建立在人的完全理性基础之上,难以切合实际。有限理性博弈则能够很好地描述实际问题。有限理性的博弈者参与到不完全信息博弈中,对博弈的规则、结构以及对手等博弈信息有一个逐渐适应和了解的过程,因此博弈应是动态进化的模型。针对这一问题,提出了一种基于Q学习算法的不完全信息博弈模型,根据Littman的最大最小原则建立了多指标体系下的策略选择概率分布;构建了Q学习与博弈融合的数学模型,使用Q学习机制来实现博弈模型的动态进化;最后将模型应用于两人追逐的仿真实验,结果表明所提出的模型能够很好地再现追逐情景。 The conventional game theory model is constructed based on human's perfect rational,which does not accord with the reality,while limited rational game can describe the real problems. The limited rational players participated in the imperfect information game will gradually learn and adapt to the game information such as the game rules,game model structure and opponent,etc. Thus the game is a dynamic evolutional model. Aiming to this problem,an imperfect information game model based on Q-learning algorithm is proposed. The strategy choice probability distributions of multiple indexes are calculated according to the Littman Max-min principle,and the mathematic model of game combined with Qlearning is constructed,in which the dynamic evolution of game model is implemented through Qlearning algorithm. Finally,the proposed model is applied to the pursuing simulation test. Simulation results show that the proposed model can excellently reflect the pursuing situation.
出处 《系统仿真技术》 2014年第3期203-210,共8页 System Simulation Technology
基金 浙江省自然科学基金资助项目(LY14F020036) 台州学院青年基金资助项目(2012QN09)
关键词 Q学习 有限理性博弈 追逐 多指标收益 Q-learning limited rational game pursuing multiple indexes pay-off
  • 相关文献

参考文献2

二级参考文献19

  • 1冀俊忠,刘椿年,阎静.一种快速的贝叶斯网结构学习算法[J].计算机研究与发展,2007,44(3):412-419. 被引量:9
  • 2van den Herik H Jaap,Uiterwijk Jos W H M,van Rijswijck Jack.Games solved:Now and in the future[J].Artificial Intelligence,2001,134:277-311.
  • 3Schaeffer J.A gamut of games[J].AI Magazine,2001,22(3):29-46.
  • 4Ginsberg M L.GIB:Imperfect information in a computationally challenging game[J].Journal of Artificial Intelligence Research (JAIR),2001,14:303-358.
  • 5Billings D,Burch N,et al.Approximating game-theoretic optimal strategies for full-scale poker[C]//Proc of IJCAI-03.San Francisco:Morgan Kaufmann,2003.
  • 6Parker A,Nau D,Subrahmanian V S.Game-tree search with combinatorially large belief states[C]//Proc of IJCAI-05.Denver:Professional Book Center,2005.
  • 7Xia Z Y,Hu Y,Wang J,et al.Analyze and guess type of piece in the computer game intelligent system[G]//LNCS 3614:Fuzzy Systems and Knowledge Discovery,Second Int Conf (FSKD 2005).Berlin:Springer,2005:1174-1183.
  • 8王轩,许朝阳.时序差分学习在非完备信息机器博弈中的应用[C]//2007中国机器博弈学术研讨会.重庆:中国人工智能学会,2007:55-58.
  • 9Xia Z Y,Zhu Y,Lu H.Evaluation function for Siguo game based on two attitudes[C]//LNCS 4223:Proc of the 3rd Int Conf on Fuzzy Systems and Knowledge Discovery.Berlin:Springer,2006:1322-1331.
  • 10Lu Hui,Xia Zhengyou.Aspiration with timer search algorithm in Siguo[C]//Proc of the 6th Int Conf on Computers and Games.Berlin:Springer,2008:264-274.

共引文献4

同被引文献21

引证文献2

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部