摘要
基于强化学习的敌方对抗行为评估能够提高仿真推演的智能化水平,强化学习算法的训练速度成为制约其实际军事应用的关键。为了加快强化学习速度,首先将敌方对抗行为评估建模为多任务强化学习,并将环境知识和经验集成到学习算法,提出基于层次情节性元强化学习(Hierarchical Episodic Meta-Deep Reinforcement Learning,HE Meta DRL)的敌方对抗行为评估方法,协同加快强化学习速度。设计了层次情节性元强化学习智能体结构,给出了具体流程;采用可微分神经字典(Differentiable Neural Dictionary,DND)的情节记忆系统,解决参数增量的问题,并在长短时记忆神经网络里叠加设计好的情节记忆系统,恢复长短时记忆神经网络里的活动模式;最后选用OpenAI Gym平台和飞行器攻防对抗智能博弈平台对方法进行测试验证。结果表明,HE Meta DRL在倒立摆任务、情节性两步任务和敌方对抗行为评估任务上都表现出良好性能,实现了层次情节性DRL和元RL协同加速强化学习的目标。
Combat behavior evaluation of the enemy based on reinforcement learning can enhance the intelligent ability of simulation and deduction.The training speed of reinforcement learning has become the key of its military application.Firstly the combat behavior evaluation of the enemy is modeled as multi-task reinforcement learning and the environment knowledge and experience are introduced into the learning algorithm.The combat behavior evaluation method of enemy based on hierarchical episodic meta-deep reinforcement learning is proposed with the idea of cooperation.The hierarchical episodic meta-deep reinforcement learning agent structure is designed and its concretely flow is discussed.The episodic memory system is designed based on the Differentiable Neural Dictionary(DND)to solve the incremental parameter adjustment problem.The architecture melds the standard Long-Short Term Memory(LSTM)working memory with an episodic memory.The hidden activations from the previous encounter of LSTM are reinstated.The OpenAI Gym and aircraft intelligent game simulation platforms are chosen for the method validation.The results show that the prediction method has better performance on the task of CartPole-V0,episodic“two-step task”and combat behavior evaluation and attains the goal of accelerating the reinforcement learning speed with cooperation of hierarchical episodic deep reinforcement learning and meta-learning.
作者
聂凯
孟庆海
NIE Kai;MENG Qing-hai(Unit 91550 of PLA, Dalian 116023, China)
出处
《指挥控制与仿真》
2021年第2期65-71,共7页
Command Control & Simulation
关键词
仿真推演
行为评估
强化学习
元学习
情节性深度强化学习
层次
simulation deduction
behavior evaluation
reinforcement learning
meta-learning
episodic deep reinforcement learning
hierarchical