期刊文献+

采用时间差分算法的九路围棋机器博弈系统 被引量:5

A 9×9 Go computer game system using temporal difference
下载PDF
导出
摘要 围棋机器博弈是机器博弈中重要的分支之一,其庞大的博弈空间给机器博弈研究者带来了巨大挑战.目前围棋机器博弈多采用静态估值搜索与蒙特卡洛树搜索,故将时间差分算法引入至九路围棋机器博弈系统中,提出基于时间差分算法的围棋机器博弈系统模型,该博弈系统具有一定的自学习能力,能在不断的对弈中逐步提高博弈能力.通过与采用α-β搜索算法的博弈系统进行实际对弈,证明了该方法的可行性. Computer Go is an important branch of computer games and presents great challenges to computer game researchers due to its need for huge game space.Presently,the static evaluation method and the Monte-Carlo tree search method are widely used in Go computer games.In this paper,a temporal difference algorithm was introduced to the 9×9 Go computer game system which gave it self-learning capability,thereby improving the game levels as a result of the continuous training.Through playing chess with a system which adopts an α-β algorithm,the new method was proven to be effective.
出处 《智能系统学报》 北大核心 2012年第3期278-282,共5页 CAAI Transactions on Intelligent Systems
基金 重庆市教委科研项目(KJ120824) 重庆市自然科学基金资助项目(2007BB2415)
关键词 机器博弈 九路围棋 围棋机器博弈 时间差分算法 computer game 9×9 Go Go computer game temporal difference
  • 相关文献

参考文献13

  • 1张聪品,刘春红,徐久成.博弈树启发式搜索的α-β剪枝技术研究[J].计算机工程与应用,2008,44(16):54-55. 被引量:6
  • 2刘知青,李文峰.现代计算机围棋基础[M].北京:北京邮电大学出版社,2011:63-80.
  • 3GELLY S, WANG Yizao, MUNOS R, ct al. Modification of UCT with patterns in Monte-Carlo Go [ R/OL ]. [ 2011-10- 15 ]. http://219. 142.86.87/paper/RR-6062. pdf.
  • 4GELLY S, WANG Yizao. Exploration exploitation in Go: UCT for Monte-Carlo Go[ C/OL]. [2011-10-151- http:// wenku, baidu, com/view/66c2edd6b9t3f90i76c61bcO, html.
  • 5张汝波,周宁,顾国昌,张国印.基于强化学习的智能机器人避碰方法研究[J].机器人,1999,21(3):204-209. 被引量:23
  • 6沈晶,顾国昌,刘海波.基于免疫聚类的自动分层强化学习方法研究[J].哈尔滨工程大学学报,2007,28(4):423-428. 被引量:2
  • 7BAE J, CHHATBAR P, FRANCIS J T, et al. Reinforce- ment learning via kernel temporal difference [ C ]//Proceed- ings of the Annual International Conference of the IEEE En- gineering in Medicine and Biology Society. Boston, USA, 2011 : 5662-5665.
  • 8SUTFON R S. Learning to predict by the methods of tempo- ral difference[ J ]. Machine Learning, 1988, 3 ( 1 ) : 9-44.
  • 9KAELBLING L P, LITIMAN M L, MOORE A W. Rein- forcement learning: a survey[ J ]. Journal of Artificial Intel- ligence Research, 1996, 4 : 237-285.
  • 10阿培丁.机器学习导论[M].北京:机械工业出版社,2009.

二级参考文献34

  • 1王骄,王涛,罗艳红,徐心和.中国象棋计算机博弈系统评估函数的自适应遗传算法实现[J].东北大学学报(自然科学版),2005,26(10):949-952. 被引量:16
  • 2Cohen P R,Feigenbaum E A.The handbook of artifical intelligenee[M].New Jersey : Addision Wesley, 1982:45-80.
  • 3Clancy W J.Heuristic classification[J].Artificial Intelligence,1985,27: 289-350.
  • 4Luger G F.Artifieal intelligence structures and strategies for complex problem solving [M].5th ed.Beijing:China Machine Press, 2006:110-118.
  • 5Sutton R S.Learning to Predict by the Method of Temporal Differences[J].Machine Learning,1988,3(1):9-44.
  • 6Autonès M,Beck A,et al.Evaluation of Chess Position by Modular Neural Network Generated by Genetic Algorithm[J].Genetic Programming,2004,3003:1-10.
  • 7Schaeffer J,Burch N,Bjornsson Y,et al.Checkers Is Solved[J].Science,2007,317(5844):1518-1522.
  • 8Wu I-Chen,Huang Dei-Yen.A New Family of k-in-a-row Games[C] ∥Proceedings of The 11th Advances in Computer Games Conference.2005:88-100.
  • 9Xu Chang-ming,Ma Z M,Xu Xin-he.A Method to Construct Knowledge Table-base in k-in-a-row Games[C] ∥Proceedings of ACM Symposium on Applied Computing.2009:929-933.
  • 10Baxter J,Tridgell A,Weaver L.KnightCap:A Chess Program that Learns by Combining TD(λ) with Game-Tree Search[C] ∥Proceedings of the 15th International Conference on Machine Learning.Madison,1998:28-36.

共引文献36

同被引文献63

引证文献5

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部