期刊文献+

基于层次情节性元强化学习的对抗行为评估 被引量:2

Combat Behavior Evaluation Based on Hierarchical EpisodicMeta-deep Reinforcement Learning
下载PDF
导出
摘要 基于强化学习的敌方对抗行为评估能够提高仿真推演的智能化水平,强化学习算法的训练速度成为制约其实际军事应用的关键。为了加快强化学习速度,首先将敌方对抗行为评估建模为多任务强化学习,并将环境知识和经验集成到学习算法,提出基于层次情节性元强化学习(Hierarchical Episodic Meta-Deep Reinforcement Learning,HE Meta DRL)的敌方对抗行为评估方法,协同加快强化学习速度。设计了层次情节性元强化学习智能体结构,给出了具体流程;采用可微分神经字典(Differentiable Neural Dictionary,DND)的情节记忆系统,解决参数增量的问题,并在长短时记忆神经网络里叠加设计好的情节记忆系统,恢复长短时记忆神经网络里的活动模式;最后选用OpenAI Gym平台和飞行器攻防对抗智能博弈平台对方法进行测试验证。结果表明,HE Meta DRL在倒立摆任务、情节性两步任务和敌方对抗行为评估任务上都表现出良好性能,实现了层次情节性DRL和元RL协同加速强化学习的目标。 Combat behavior evaluation of the enemy based on reinforcement learning can enhance the intelligent ability of simulation and deduction.The training speed of reinforcement learning has become the key of its military application.Firstly the combat behavior evaluation of the enemy is modeled as multi-task reinforcement learning and the environment knowledge and experience are introduced into the learning algorithm.The combat behavior evaluation method of enemy based on hierarchical episodic meta-deep reinforcement learning is proposed with the idea of cooperation.The hierarchical episodic meta-deep reinforcement learning agent structure is designed and its concretely flow is discussed.The episodic memory system is designed based on the Differentiable Neural Dictionary(DND)to solve the incremental parameter adjustment problem.The architecture melds the standard Long-Short Term Memory(LSTM)working memory with an episodic memory.The hidden activations from the previous encounter of LSTM are reinstated.The OpenAI Gym and aircraft intelligent game simulation platforms are chosen for the method validation.The results show that the prediction method has better performance on the task of CartPole-V0,episodic“two-step task”and combat behavior evaluation and attains the goal of accelerating the reinforcement learning speed with cooperation of hierarchical episodic deep reinforcement learning and meta-learning.
作者 聂凯 孟庆海 NIE Kai;MENG Qing-hai(Unit 91550 of PLA, Dalian 116023, China)
机构地区 中国人民解放军
出处 《指挥控制与仿真》 2021年第2期65-71,共7页 Command Control & Simulation
关键词 仿真推演 行为评估 强化学习 元学习 情节性深度强化学习 层次 simulation deduction behavior evaluation reinforcement learning meta-learning episodic deep reinforcement learning hierarchical
  • 相关文献

参考文献9

二级参考文献153

  • 1LI Chang-chun1, ZHANG Guang-sheng1, LEI Tian-jie2, 3, GONG A-du2, 3 1. School of Surveying & Land Information Engineering, Henan Polytechnic University, Jiaozuo 454000, China,2. Key Laboratory of Environmental Change and Natural Disaster, Ministry of Education, Beijing Normal University, Beijing 100875, China,3. Ministry of Civil Affairs/Ministry of Education of China Academy of Disaster Reduction and Emergency Management, Beijing Normal University, Beijing 100875, China.Quick image-processing method of UAV without control points data in earthquake disaster area[J].中国有色金属学会会刊:英文版,2011,21(S3):523-528. 被引量:14
  • 2魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量:19
  • 3高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量:38
  • 4阮武杰,赵伟华.基于HLA的直升机作战仿真平台研究[J].科技咨询导报,2007(27):3-3. 被引量:1
  • 5MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-levelcontrol through deep reinforcement learning [J]. Nature, 2015,518(7540): 529 – 533.
  • 6SILVER D, HUANG A, MADDISON C, et al. Mastering the gameof Go with deep neural networks and tree search [J]. Nature, 2016,529(7587): 484 – 489.
  • 7AREL I. Deep reinforcement learning as foundation for artificialgeneral intelligence [M] //Theoretical Foundations of Artificial GeneralIntelligence. Amsterdam: Atlantis Press, 2012: 89 – 102.
  • 8TEAAURO G. TD-Gammon, a self-teaching backgammon program,achieves master-level play [J]. Neural Computation, 1994,6(2): 215 – 219.
  • 9SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge MA: MIT Press, 1998.
  • 10KEARNS M, SINGH S. Near-optimal reinforcement learning inpolynomial time [J]. Machine Learning, 2002, 49(2/3): 209 – 232.

共引文献714

同被引文献20

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部