期刊文献+

基于好奇心机制改进的策略优化算法

Improved Policy Optimization Algorithm Based on Curiosity Mechanism
下载PDF
导出
摘要 针对强化学习决策模型生成过程中,由于复杂环境和状态信息观察不完全导致经典的近端策略优化算法处理过程中面临的探索与利用效率较低、生成的策略效果较差等问题,提出了一种基于好奇心机制改进的基于最大到达次数的近端策略优化算法(proximal policy optimization based on maximum number of arrival&expert knowledge,MNAEK-PPO)。围绕策略空间的探索困难问题,通过构建智能体在训练过程中的探索频次矩阵,对探索频次进行处理后作为内在奖励参与到智能体的强化学习训练过程,此外还加入了专家知识辅助智能体进行决策。通过在智能化战场仿真环境中的实验确定了MNAEK-PPO中内在奖励的最佳构造方式,并进行了一系列对比实验,实验结果表明,MNAEK-PPO大幅提升了决策空间的探索效率,收敛速度和对局得分均有明显提升,为推动深度强化学习在智能战术策略生成中的应用与发展提供了新的解决思路。 In the generation process of reinforcement learning decision model,due to the complex environment and incomplete observation of state information,the classical proximal policy optimization algorithm faces problems such as low exploration and utilization efficiency and poor effect of generated strategies,this paper proposes an MNAEK-PPO(proximal policy optimization based on maximum number of arrival&expert knowledge algorithm)based on curiosity mechanism.Focusing on the difficult problem of exploring the strategy space,by constructing the exploration frequency matrix of the agent in the training process,the exploration frequency is treated as an internal reward to participate in the agent’s reinforcement learning and training process.In addition,expert knowledge is added to assist the agent in making decisions.Through experiments in the intelligent battlefield simulation environment,the best construction method of internal rewards in MNAEK-PPO is determined,and a series of comparative experiments are carried out.The experimental results show that MNAEK-PPO greatly improves the exploration efficiency of decision space,and the convergence speed and game score are significantly improved,which provides a new solution for promoting the application and development of deep reinforcement learning in the generation of intelligent tactical strategies.
作者 张启阳 陈希亮 曹雷 赖俊 ZHANG Qiyang;CHEN Xiliang;CAO Lei;LAI Jun(College of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China)
出处 《计算机工程与应用》 CSCD 北大核心 2023年第11期63-70,共8页 Computer Engineering and Applications
基金 国家自然科学基金(61806221)。
关键词 人工智能 深度强化学习 好奇心机制 知识迁移 策略优化 智能战术 artificial intelligence deep reinforcement learning curiosity mechanism knowledge transfer strategy optimization intelligent tactics
  • 相关文献

参考文献2

二级参考文献5

共引文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部