期刊文献+

基于深度强化学习的兵棋推演决策方法框架 被引量:12

Framework of wargaming decision-making methods based on deep reinforcement learning
下载PDF
导出
摘要 针对兵棋推演的自动对抗问题,文章提出基于深度学习网络和强化学习模型来构建对抗策略。文章结合深度强化学习技术优势,立足多源层次化的战场态势描述,提出面向智能博弈的战场态势表示方法;将作战指挥分层分域的原则同即时策略游戏中的模块化和分层架构相结合,提出一种层次化和模块化深度强化学习方法框架,用于各决策智能体与战场环境交互的机制以及对抗策略的产生;为满足实际作战响应高实时特点,提出压缩的深度强化学习,提升模型输出速度;为改善对不同环境的适应性,提出利用深度迁移学习提升模型泛化能力。 In order to solve the problem of automatic confrontation in wargaming,this paper puts forward a countering strategy based on a deep learning network and a reinforcement learning model.Combined with the advantages of deep reinforcement learning and multi-source hierarchical battlefield situation description,this paper proposes a battlefield situation representation method.A hierarchical and modular deep reinforcement learning framework is then proposed,by combining the principle of hierarchical and domain command with the modular and layered architecture of deep reinforcement learning in real-time strategy games,and applied to the interaction mechanism between decision agents and battlefield environment as well as the formulation of countering strategies.Considering the characteristics of high real-time operational response,a compressed deep reinforcement learning method is proposed to accelerate the output speed of the model.In order to improve the adaptability to different environments,a deep transfer learning method is also proposed to improve the generalization ability of the model.
作者 崔文华 李东 唐宇波 柳少军 CUI Wenhua;LI Dong;TANG Yubo;LIU Shaojun(National Defense University, Beijing 100091, China)
机构地区 国防大学
出处 《国防科技》 2020年第2期113-121,共9页 National Defense Technology
关键词 兵棋推演 深度强化学习 态势表示 压缩学习方法 深度迁移学习 wargaming deep reinforcement learning situation representation compression learning methodologies deep transfer learning
  • 相关文献

参考文献5

二级参考文献139

  • 1魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量:19
  • 2高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量:38
  • 3MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-levelcontrol through deep reinforcement learning [J]. Nature, 2015,518(7540): 529 – 533.
  • 4SILVER D, HUANG A, MADDISON C, et al. Mastering the gameof Go with deep neural networks and tree search [J]. Nature, 2016,529(7587): 484 – 489.
  • 5AREL I. Deep reinforcement learning as foundation for artificialgeneral intelligence [M] //Theoretical Foundations of Artificial GeneralIntelligence. Amsterdam: Atlantis Press, 2012: 89 – 102.
  • 6TEAAURO G. TD-Gammon, a self-teaching backgammon program,achieves master-level play [J]. Neural Computation, 1994,6(2): 215 – 219.
  • 7SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge MA: MIT Press, 1998.
  • 8KEARNS M, SINGH S. Near-optimal reinforcement learning inpolynomial time [J]. Machine Learning, 2002, 49(2/3): 209 – 232.
  • 9KOCSIS L, SZEPESVARI C. Bandit based Monte-Carlo planning[C] //Proceedings of the European Conference on MachineLearning. Berlin: Springer, 2006: 282 – 293.
  • 10LITTMAN M L. Reinforcement learning improves behaviour fromevaluative feedback [J]. Nature, 2015, 521(7553): 445 – 451.

共引文献650

同被引文献167

引证文献12

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部