期刊文献+

基于互信息的智能博弈对抗分层强化学习研究

Research on Hierarchical Reinforcement Learning of Intelligent Game Confrontation Based on Mutual Information
下载PDF
导出
摘要 智能博弈在当前人工智能的发展中是较为热点的一个问题,同时随着人工智能的不断发展,在作战指挥领域也逐渐得到了广泛的应用,尤其以美国DAPPA为首,利用人工智能来为指挥员的战场决策提供全方位的策略支持,如何利用人工智能模拟战场环境下进行战场对抗也是研究的一方面。当前智能体虽然能够通过获得奖励不断进行优化,在策略上通常是根据即时奖励选择当时收益最大的策略,现实战场环境中有些决策当时虽不会有即时收益,但之后是会对整体的战场形势有更好的推动作用,能够取得更有利的战果。针对此问题,利用分层强化学习进行智能体的智能博弈训练,并应用于简单战场环境下来模拟虚拟指挥员,提出了一种基于互信息的智能博弈对抗的分层强化学习算法MI-A3C。MI-A3C算法在模拟的战场环境中能够取得86.7%的胜率,并能够完成主要任务,同时在实验中可以发现一些有利于长远收益的决策。 Intelligent game is a hot issue in the current development of artificial intelligence.At the same time,with the continuous development of artificial intelligence,it has gradually been widely used in the field of battle command.Especially,led by American DAPPA,artificial intelligence is used to provide all-round strategic support for commanders'battlefield decisions.How to use artificial intelligence to simulate battlefield confrontation in battlefield environment is also one of its research aspects.At present,although agents can continuously optimize by obtaining rewards,they are usually real-time strategies in strategy.Although some decisions in battlefield environment will not have immediate benefits at that time,but then it will play a better role in promoting the overall battlefield situation and achieve more favorable results.To solve this problem,hierarchical reinforcement learning is used for intelligent game training of agents and applied to simulate virtual commanders in a simple battlefield environment.A hierarchical reinforcement learning algorithm MI-A3C algorithm based on intelligent game confrontation based on mutual information is proposed.MI-A3C algorithm can achieve 86.7%victory rate in the simulated battlefield environment,and can complete the main tasks.At the same time,some decisions conducive to long-term benefits can be found in the experiment.
作者 魏竞毅 赖俊 陈希亮 WEI Jing-yi;LAI Jun;CHEN Xi-liang(School of Command Information System,Army Engineering University,Nanjing 210007,China)
出处 《计算机技术与发展》 2022年第9期142-147,共6页 Computer Technology and Development
关键词 智能博弈 强化学习 互信息 分层 A3C算法 分队指挥 intelligent game reinforcement learning mutual information hierarchical A3C algorithm unit commander
  • 相关文献

参考文献6

二级参考文献95

共引文献597

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部