期刊文献+

基于对手动作预测的智能博弈对抗算法 被引量:1

Intelligent Game Countermeasures Algorithm Based on Opponent Action Prediction
下载PDF
导出
摘要 智能博弈对抗场景中,多智能体强化学习算法存在“非平稳性”问题,智能体的策略不仅取决于环境,还受到环境中对手(其他智能体)的影响。根据对手与环境的交互信息,预测其策略和意图,并以此调整智能体自身策略是缓解上述问题的有效方式。提出一种基于对手动作预测的智能博弈对抗算法,对环境中的对手进行隐式建模。该算法通过监督学习获得对手的策略特征,并将其与智能体的强化学习模型融合,缓解对手对学习稳定性的影响。在1v1足球环境中的仿真实验表明,提出的算法能够有效预测对手的动作,加快学习收敛速度,提升智能体的对抗水平。 In the intelligent game confrontation scenario,the multi-agent reinforcement learning algorithm has the problem of“non stationarity”.The policy of the agent depends not only on the environment,but also on opponent,other agents in the environment.According to the interaction information between the opponent and the environment,predicting its strategy and intention,and adjusting the agent’s own strategy is an effective way to alleviate the above problems.An intelligent game confrontation algorithm based on opponent action prediction is proposed to implicitly model the opponent in the environment.The algorithm obtains the opponent’s policy features through supervised learning,and integrates them with the agent’s reinforcement learning model to alleviate the influence of the opponent on learning stability.The simulation experiments in 1v1 soccer environment show that the proposed algorithm can effectively predict the opponent’s actions,accelerate the learning convergence speed and improve the confrontation level of agents.
作者 韩润海 陈浩 刘权 黄健 HAN Runhai;CHEN Hao;LIU Quan;HUANG Jian(College of Intelligent Science and Technology,National University of Defense Technology,Changsha 410073,China)
出处 《计算机工程与应用》 CSCD 北大核心 2023年第7期190-197,共8页 Computer Engineering and Applications
关键词 对手动作预测 竞争双深度Q网络(D3QN) 智能博弈对抗 深度强化学习 opponent action prediction dueling double deep Q network(D3QN) intelligent game confrontation deep reinforcement learning
  • 相关文献

参考文献9

二级参考文献35

  • 1薛方正,冯挺,徐心和.足球机器人系统仿真中的碰撞研究[J].机器人,2005,27(1):78-81. 被引量:2
  • 2薛方正,方帅,徐心和.多机器人对抗系统仿真中的对手建模[J].系统仿真学报,2005,17(9):2138-2141. 被引量:7
  • 3Heekman D, Geiger D, Chiekering D. Learning Bayesian networks:the combination of knowledge and statistical data[J]. Machine Learning, 1995, 20(3):197-243.
  • 4Heekman D, Shaehter R. Decision-Theoretic foundations for causal reasoning[J]. Journal of Artificial Intelligence Research, 1995, 3:405-430.
  • 5Xue Fangzheng, Fang Shuai, Xu Xinhe. Artificial Ecological Pyramid Model and Its Application in Autonomous Robot Strategy System [C].Proceedings of IEEE Conference on Robotics and Biomimetics(Robio). Shenyang, China. 2004, 8: 845-849.
  • 6Haddadi,A.,Sundermeyer,K.Belief-Desire-Intention agent architectures.In: O'Hare,G.M.P,Jennings,N.R.,eds.Foundations of Distributed Artifcial Intelligence.New York: John Wiley&Sons Inc.,1996.169~185.
  • 7Anand,S.,Rao.Multi-Agent mental-state recogition and its application to air-combat modellint.In: Proceedings of the Workshop on Distributed Artificial Inrtlligence.1994.283~304.
  • 8Milind,Tambe.RESC: an approach for real-time,dynamic agent tracking.In: Proceedings of the Joint Conference on Artificial Intrllifence.1995.
  • 9Milind,Tambe.Building agent teams using an explicit teamwork model and learning.Artificial Intelligence,1999,(110):215~239.
  • 10Hill,R.,Johnson,W.L.Situated plan attribution for intelligence tutoring.In: Proceedings of the National Conference on Artijicial Intelligence.Menlo Parl,CA: AAAI Press,1994.

共引文献108

同被引文献13

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部