期刊文献+

基于粒子群优化的德州扑克在线对手利用 被引量:1

Online opponent exploitation method based on particle swarm optimization for Texas Hold’em
原文传递
导出
摘要 德州扑克中,相比于采用均衡策略求解的方法,对手利用是针对存在弱点的对手以获取更大收益的更有效方法.然而在面对一个全新对手时,在线条件下如何高效利用对手仍然是一大难题.现有方法常采用离线训练在线适应的方式来避开这一问题,即利用学习、演化等方法,通过海量离线训练来获得具有对手适应性的模型,使其能在比赛中适应不同的对手,而不是在比赛中针对一个新对手在线主动地优化自身策略.对此,以在线主动策略优化实现有效对手利用为目的,基于时间维的粒子定义提出一种基于粒子群优化的策略优化方法,将在线策略优化的思路引入德州扑克这种具有强随机性的博弈问题中,开展对手利用并实现在线比赛收益最大化.针对适应度计算受随机运气影响以及部分对手针对性策略难以优化的问题,提出一种基于局部最优解替代、全局最优解替代的改进粒子群优化算法(BR-PSO).实验结果表明,对于标准PSO方法难以针对的对手,所提出的方法能有效获得对手的针对性策略以实现最大化对手利用,而且优化策略的收益能够媲美基于手牌预测AI的收益. In Texas Hold’em,opponent exploitation is the more effective method to obtain larger income from opponents with weakness in contrast to the Nash equilibrium searching method.However,how to effectively exploit the brand new opponent under the condition of online competitions is still a challenge.The existing methods usually use offline training and online adaptation to avoid this problem,that is,using like learning,evolution methods to obtain a model with opponent adaptability through massive offline training,so that it can adapt to different opponents in competitions,instead of actively optimizing its own policy for a new opponent in the online competition.For the purpose of online active policy optimizing to achieve effective opponent exploitation,a policy optimization method based on particle swarm optimization(PSO)is proposed to maximize the competition income,which introduces the idea of online optimization into Texas Hold’em regarded as an game problem with strong randomness.Aiming to the problems that fitness computation is affected by random luck and targeted policies for some opponents are hard to optimize with the standard PSO,a modified PSO method called BR-PSO(best replacement-PSO)is proposed based on local optimal solution replacement and global optimal solution replacement.The result of experiments indicates the proposed method can find targeted policies to maximize opponent exploitation of the opponents that are hard to counter with the standard PSO,and the income of the optimized policy is comparable to that of AI based on the hand prediction method.
作者 胡振震 陈少飞 袁唯淋 李鹏 陈璟 HU Zhen-zhen;CHEN Shao-fei;YUAN Wei-lin;LI Peng;CHEN Jing(College of Intelligence Science and Technology,National University of Defense Technology,Changsha 410073,China)
出处 《控制与决策》 EI CSCD 北大核心 2024年第5期1687-1696,共10页 Control and Decision
基金 国家自然科学基金项目(61806212,62376280)。
关键词 粒子群优化 策略优化 最优解替代 对手利用 在线比赛 德州扑克 particle swarm optimization policy optimization optimal solution replacement opponent exploitation online competition Texas Hold’em
  • 相关文献

参考文献2

二级参考文献12

共引文献48

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部