摘要
针对传统的显式建模方法依赖大量数据样本的问题,提出了策略自扩展算法,通过样本数据的自扩展来提高建模效率。同时,为了提高对手模型的准确度,结合隐式建模和隐式子策略建模的方法,提出了子策略发现算法。以勒杜克(Leduc)扑克博弈为实验对象,对比研究了2种传统方法和本文提出的2种算法。实验结果表明:策略自扩展算法提高了显式建模的效率和模型准确性。在利用对手弱点获取收益方面,策略自扩展算法比显式建模方法提升了84. 4%,子策略发现算法比隐式建模方法提升了128. 6%。
For the problem of traditional explicit modeling relied on large numbers of data samples,the policy bootstrapping algorithm was introduced to improve the modeling efficiency through the bootstrapping of sample data.Meanwhile,in order to enhance the accuracy of opponent model,implicit modeling method and subpolicy implicit modeling method were combined to propose subpolicy discovery algorithm.The game of Leduc poker was used as an experimental subject to compare and two traditional methods the two new algorithms.The results indicate that policy bootstrapping improves the efficiency of explicit modeling and the accuracy of the model.Compared with the explicit modeling method,policy bootstrapping algorithm improves 77.9%in profits by using the opponent’s weakness,and the subpolicy discovery algorithm improves 128.6%compared with the implicit modeling method.
作者
吴天栋
石英
WU Tiandong;SHI Ying(School of Automation,Wuhan University of Technology,Wuhan 430070,China)
出处
《河南科技大学学报(自然科学版)》
CAS
北大核心
2019年第1期54-59,109,共7页
Journal of Henan University of Science And Technology:Natural Science
基金
国家自然科学基金项目(61673306)
江苏省科技研究与发展计划基金项目(BE2016155)
关键词
不完美信息博弈
对手模型
策略自扩展
隐式建模
imperfect information games
opponent modeling
policy bootstrapping
implicit modeling