摘要
计算机棋类游戏学习中的自对弈学习指仅依赖行棋过程及最终的输赢结果的学习.整个过程中除下棋规则外不预设任何领域知识,也无专家指导.虽然基于极大极小算法、α-β剪枝算法和蒙特卡洛搜索的自对弈学习已经取得了卓越成果,但是目前仍旧缺乏对于学习样例质量评价的针对性研究.因此,本文首次提出了一种自对弈棋局学习样例质量评价方法,该方法采用样本规模综合指标T—使用样例重复度和样例个数的线性组合—来决定学习样例大小.在西洋跳棋上的实验表明,本评价方法可以达到有效控制学习样例规模的目的,在不降低学习效果的前提下大幅降低学习样例产生的计算成本.
Self-play game learning in computer chess game learning refers to learning that relies only on the chess process and the final winning and losing results.Except for the rules of playing chess,no domain knowledge is preset in the whole process,and there is no expert guidance.Although the self-play learning based on the minimax algorithm,α-β pruning algorithm and Monte Carlo search has achieved excellent results,there is still a lack of targeted research on the quality evaluation of learning examples.Therefore,this paper proposes for the first time a self-play chess game learning sample quality evaluation method.This method uses a sample size comprehensive indicator T-using a linear combination of sample repeatability and sample number-to determine the size of the learning samples.Experiments on checkers show that the evaluation method can achieve the purpose of effectively controlling the size of the learning examples,and greatly reduce the calculation cost of the learning examples without reducing the learning effect.
作者
姬波
尤惠彬
卢红星
田欣
柳宏川
JI Bo;YOU Hui-bin;LU Hong-xing;TIAN Xin;LIU Hong-chuan(School of Information Engineering,Zhengzhou University,Zhengzhou 450001.China;Fourth Generation of Industry Research Institute,Zhengzhou University,Zhengzhou 450001.China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第3期467-471,共5页
Journal of Chinese Computer Systems
基金
国家重点研发计划项目(2018YFB1201403)资助
国家自然科学基金项目(61772475,61502434)资助。