期刊文献+

基于强化学习的体系对抗仿真战役层次指控算法 被引量:1

A RL-based command and control algorithm for SoS confrontation simulation at the tactical level
下载PDF
导出
摘要 针对传统的认知决策技术无法有效应对体系对抗环境具有的不确定性、未知性以及复杂性问题,提出一种基于强化学习(RL)的体系对抗仿真战役层次指控算法。介绍了包含侦察类、打击类、通信类、补给类、修复类以及指控类Agent的UML体系架构,对自主开发的作战仿真原型系统及其作战想定进行了说明,在对战役层次指控Agent认知域描述与假设的基础上,对改进Q-learning认知决策算法的参数归一化、基于GRBF神经网络的Q离散、基于TD公式的跨步差分机制以及网络结构的学习训练过程进行了详细说明。最后,通过地空一体化联合体系对抗仿真验证了算法的有效性,并通过对算法的大量可视化回溯分析发现,一定程度的火力协调以及不间断的战术机动对于作战效能的提升以及毁伤的减免具有重要的意义。 Aiming at the problem that the traditional cognition techniques are not adaptive to the uncertainty and complexity in the Weapon System-of-Systems(WSoS)confrontation environment,a Command and Control(C2)algorithm based on Reinforcement Learning(RL)is proposed for the WSoS confrontation simulation at the tactical level.The UML architecture of WSoS that consists of a communication class,scouting class,attacking class,command class,supplying class and repairing class is designed and the battle simulation platform with the battle scenario is introduced.Then,based on the illustration and hypothesis for the command agent's cognition problem,the parameter's normalization,the discrete of the Q table based on GRBF neural network,the strip temporal difference mechanism and the learning process of the structure of the network are explained for the improved Q-leaning cognition algorithm.Finally,the validation and effectiveness of the algorithm is proved through the battle simulation experiment of the air-ground unify confrontation SoS.Besides,through a lot of visualization recall analysis for the C2 algorithm,we found that the coordination of the firepower and the continuous tactical maneuver are important to the operational effectiveness and injure decrease.
作者 闫雪飞 李新明 刘东 刘德生 李强 YAN Xue-fei;LI Xin ruing;LIU Dong;LIU De sheng;LI Qiang(Laboratory of Science and Technology on Complex Electronic System Simulation,Equipment Academy,Beijing 101416,China)
出处 《计算机工程与科学》 CSCD 北大核心 2018年第8期1511-1520,共10页 Computer Engineering & Science
基金 装备预研领域基金项目(61400010103) 重点实验室基础研究项目(DXZT-JC-ZZ-2015-007)
关键词 武器装备体系 作战仿真 强化学习 GRBF神经网络 认知决策 weapon system of systems battle simulation reinforcement learning GRBF neural network cognition
  • 相关文献

参考文献8

二级参考文献84

  • 1岑凯辉,谭跃进,杨克巍,李孟军.军事能力到装备系统的双层规划模型及其求解算法[J].国防科技大学学报,2007,29(5):128-131. 被引量:4
  • 2吴强,姜玉宪.反舰导弹综合突防技术[J].北京航空航天大学学报,2004,30(12):1212-1215. 被引量:13
  • 3淦文燕,李德毅,王建民.一种基于数据场的层次聚类方法[J].电子学报,2006,34(2):258-262. 被引量:82
  • 4杨镜宇,司光亚,胡晓峰.战争系统体系能力需求的建模与仿真[J].系统仿真学报,2006,18(12):3599-3602. 被引量:14
  • 5Watkins C J, Dayan P. Q-learning[J]. Machine Learning, 1992, 8(3):279-292.
  • 6Howard R A. Dynamic programming and markov processes[ M ]. Cambridge: MIT Press, 1960.
  • 7Sutton R S, Barto A G. Time derivative models of pavlovian rein- forcement,learning and computational neuroscience : foundations of adaptive networks [ M ]. Cambridge : MIT Press, 1990 : 497 - 537.
  • 8Baron Sheldon,Kelinman D L,Serben Saul. A study of the markov game approach to tactical maneuvering problems [ R ]. NASA CR-1979,1972.
  • 9Moore A W, Atkeson C G. The patti-game algorithm for variable resolution reinforcement learning in multidimensional statespaces [ J]. Machine Learning, 1995,21 ( 3 ) : 199 - 233.
  • 10Park J, Sandberg I W. Universal approximation using radial-basis function network [ J]. Neural Computation, 1991 (3) :246 - 257.

共引文献64

同被引文献12

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部