基于对手动作预测的智能博弈对抗算法被引量：1

Intelligent Game Countermeasures Algorithm Based on Opponent Action Prediction

下载PDF

导出

摘要智能博弈对抗场景中,多智能体强化学习算法存在“非平稳性”问题,智能体的策略不仅取决于环境,还受到环境中对手(其他智能体)的影响。根据对手与环境的交互信息,预测其策略和意图,并以此调整智能体自身策略是缓解上述问题的有效方式。提出一种基于对手动作预测的智能博弈对抗算法,对环境中的对手进行隐式建模。该算法通过监督学习获得对手的策略特征,并将其与智能体的强化学习模型融合,缓解对手对学习稳定性的影响。在1v1足球环境中的仿真实验表明,提出的算法能够有效预测对手的动作,加快学习收敛速度,提升智能体的对抗水平。 In the intelligent game confrontation scenario,the multi-agent reinforcement learning algorithm has the problem of“non stationarity”.The policy of the agent depends not only on the environment,but also on opponent,other agents in the environment.According to the interaction information between the opponent and the environment,predicting its strategy and intention,and adjusting the agent’s own strategy is an effective way to alleviate the above problems.An intelligent game confrontation algorithm based on opponent action prediction is proposed to implicitly model the opponent in the environment.The algorithm obtains the opponent’s policy features through supervised learning,and integrates them with the agent’s reinforcement learning model to alleviate the influence of the opponent on learning stability.The simulation experiments in 1v1 soccer environment show that the proposed algorithm can effectively predict the opponent’s actions,accelerate the learning convergence speed and improve the confrontation level of agents.

作者韩润海陈浩刘权黄健 HAN Runhai;CHEN Hao;LIU Quan;HUANG Jian(College of Intelligent Science and Technology,National University of Defense Technology,Changsha 410073,China)

机构地区国防科技大学智能科学学院

出处《计算机工程与应用》 CSCD 北大核心 2023年第7期190-197,共8页 Computer Engineering and Applications

关键词对手动作预测竞争双深度Q网络(D3QN) 智能博弈对抗深度强化学习 opponent action prediction dueling double deep Q network(D3QN) intelligent game confrontation deep reinforcement learning

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献9

1董豪,杨静,李少波,王军,段仲静.基于深度强化学习的机器人运动控制研究进展[J].控制与决策,2022,37(2):278-292. 被引量：36
2曹雷.基于深度强化学习的智能博弈对抗关键技术[J].指挥信息系统与技术,2019,10(5):1-7. 被引量：44
3李毅,石纯一.基于BDI的对手Agent模型[J].软件学报,2002,13(4):643-648. 被引量：17
4顿文力,孟庆春,庄晓东.对抗性多机器人系统对手建模的研究[J].计算机应用研究,2004,21(3):53-55. 被引量：3
5李淑琴,龙海楠.基于对手意图预测算法的机器鱼对抗策略研究[J].计算机仿真,2014,31(7):360-365. 被引量：1
6薛方正,方帅,徐心和.多机器人对抗系统仿真中的对手建模[J].系统仿真学报,2005,17(9):2138-2141. 被引量：7
7罗键,武鹤.基于交互式动态影响图的对手建模[J].控制与决策,2016,31(4):635-639. 被引量：4
8刘婵娟,赵天昊,刘睿康,张强.智能体对手建模研究进展[J].图学学报,2021,42(5):703-711. 被引量：2
9罗俊仁,张万鹏,袁唯淋,胡振震,陈少飞,陈璟.面向多智能体博弈对抗的对手建模框架[J].系统仿真学报,2022,34(9):1941-1955. 被引量：8

二级参考文献35

1薛方正,冯挺,徐心和.足球机器人系统仿真中的碰撞研究[J].机器人,2005,27(1):78-81. 被引量：2
2薛方正,方帅,徐心和.多机器人对抗系统仿真中的对手建模[J].系统仿真学报,2005,17(9):2138-2141. 被引量：7
3Heekman D, Geiger D, Chiekering D. Learning Bayesian networks:the combination of knowledge and statistical data[J]. Machine Learning, 1995, 20(3):197-243.
4Heekman D, Shaehter R. Decision-Theoretic foundations for causal reasoning[J]. Journal of Artificial Intelligence Research, 1995, 3:405-430.
5Xue Fangzheng, Fang Shuai, Xu Xinhe. Artificial Ecological Pyramid Model and Its Application in Autonomous Robot Strategy System [C].Proceedings of IEEE Conference on Robotics and Biomimetics(Robio). Shenyang, China. 2004, 8: 845-849.
6Haddadi,A.,Sundermeyer,K.Belief-Desire-Intention agent architectures.In: O'Hare,G.M.P,Jennings,N.R.,eds.Foundations of Distributed Artifcial Intelligence.New York: John Wiley&Sons Inc.,1996.169～185.
7Anand,S.,Rao.Multi-Agent mental-state recogition and its application to air-combat modellint.In: Proceedings of the Workshop on Distributed Artificial Inrtlligence.1994.283～304.
8Milind,Tambe.RESC: an approach for real-time,dynamic agent tracking.In: Proceedings of the Joint Conference on Artificial Intrllifence.1995.
9Milind,Tambe.Building agent teams using an explicit teamwork model and learning.Artificial Intelligence,1999,(110):215～239.
10Hill,R.,Johnson,W.L.Situated plan attribution for intelligence tutoring.In: Proceedings of the National Conference on Artijicial Intelligence.Menlo Parl,CA: AAAI Press,1994.

共引文献108

1秦之凡,杨伟龙.基于粒子滤波的隐式对手策略匹配方法[J].装甲兵学报,2022(5):86-92.
2刘峰波,党飞飞,杨满囤,马平,赵俊达.井下防淤积清仓机器人行走机构纠偏控制研究[J].煤炭工程,2022,54(S01):195-199.
3邓有朋,范佳宣,郑岩,王振亚,吕勇梁,李雨霄.不完全信息下多智能体对手建模[J].航空学报,2023,44(S02):443-452.
4胡泊,王三民,王宝树.基于智能规划的计划识别模型[J].计算机工程与设计,2005,26(7):1747-1750.
5薛方正,方帅,徐心和.多机器人对抗系统仿真中的对手建模[J].系统仿真学报,2005,17(9):2138-2141. 被引量：7
6王磊,孙增圻.基于行为的多机器人对手意图识别二次估计方法[J].清华大学学报（自然科学版）,2005,45(10):1421-1424. 被引量：5
7黄河笑,覃征,郭俊文.一个改进的传统协商模型和算法[J].微电子学与计算机,2006,23(1):74-76. 被引量：2
8黄河笑,覃征,郭俊文.一种改进的辩论协商模型及其算法[J].西安交通大学学报,2006,40(2):129-132. 被引量：2
9黄新宇,向中凡.基于对手的足球机器人策略研究[J].西华大学学报（自然科学版）,2006,25(2):37-38.
10姬朝阳,朱绍文,朱秋云,张琪.一种基于对手行为预测的多智能体合作研究[J].计算机与现代化,2006(5):25-27.

同被引文献13

1朱张莉,饶元,吴渊,祁江楠,张钰.注意力机制在深度学习中的研究进展[J].中文信息学报,2019,33(6):1-11. 被引量：127
2Chang WANG,Lizhen WU,Chao YAN,Zhichao WANG,Han LONG,Chao YU.Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork[J].Chinese Journal of Aeronautics,2020,33(11):2930-2945. 被引量：15
3刘强,姜峰.基于深度强化学习的群体对抗策略研究[J].智能计算机与应用,2020,10(5):291-296. 被引量：2
4Kaiqing ZHANG,Zhuoran YANG,Tamer BAŞAR.Decentralized multi-agent reinforcement learning with networked agents: recent advances[J].Frontiers of Information Technology & Electronic Engineering,2021,22(6):802-814. 被引量：6
5汪晨曦,赵学艳,郭新.基于权重值的竞争深度双Q网络算法[J].南京信息工程大学学报（自然科学版）,2021,13(5):564-570. 被引量：4
6杨飞扬,崔荣一,赵亚慧,金晶,李飞雨.基于强化学习与自注意力机制的朝鲜语重要句子结构识别[J].中文信息学报,2021,35(9):66-74. 被引量：2
7胡凯,郑翡,卢飞宇,黄昱锟.基于深度学习的行为识别算法综述[J].南京信息工程大学学报（自然科学版）,2021,13(6):730-743. 被引量：9
8Jianrui Wang,Yitian Hong,Jiali Wang,Jiapeng Xu,Yang Tang,Qing-Long Han,Jürgen Kurths.Cooperative and Competitive Multi-Agent Systems:From Optimization to Games[J].IEEE/CAA Journal of Automatica Sinica,2022,9(5):763-783. 被引量：9
9李静晨,史豪斌,黄国胜.基于自注意力机制和策略映射重组的多智能体强化学习算法[J].计算机学报,2022,45(9):1842-1858. 被引量：4
10Meng-Hao Guo,Tian-Xing Xu,Jiang-Jiang Liu,Zheng-Ning Liu,Peng-Tao Jiang,Tai-Jiang Mu,Song-Hai Zhang,Ralph R.Martin,Ming-Ming Cheng,Shi-Min Hu.Attention mechanisms in computer vision:A survey[J].Computational Visual Media,2022,8(3):331-368. 被引量：86

引证文献1

1夏庆锋,许可儿,李明阳,胡凯,宋利鹏,宋志强,孙宁.强化学习中的注意力机制研究综述[J].计算机科学与探索,2024,18(6):1457-1475.

1张宏国,陈庆锐,马雯瑞.基于使用的白谎语言策略特征探究[J].黑龙江工业学院学报（综合版）,2023,23(1):137-141.
2郑斯月.智媒视域下的高职学生思政课学习获得感影响因素及提升路径[J].学园,2022,15(33):1-4. 被引量：2
3王钦钊,多南讯,吕强,杨奇东.基于强化学习的多智能体合作博弈对抗算法[J].装甲兵学报,2022(5):80-85. 被引量：4
4王华华,王永益.局限及其规避:智能算法在铸牢中华民族共同体意识中的运用[J].中南民族大学学报（人文社会科学版）,2023,43(2):28-36. 被引量：10
5侯瑞利.农村道路桥梁路基工程的试验检测策略分析[J].中文科技期刊数据库（全文版）工程技术,2023(4):89-92.
6高世杰.足球环境提升的路径研究[J].中文科技期刊数据库（全文版）教育科学,2021(8):314-314.
7娄珂政.浅谈赏识教育在小学语文教学中的应用[J].试题与研究,2023(6):191-193. 被引量：1
8卢巧杰,王楠,李金宝,李坤.融合用户感知和多因素的兴趣点推荐[J].浙江大学学报（工学版）,2023,57(2):310-319.
9李文超,武一帆.算法侵害行为的事前规制与侵权救济研究[J].法律适用,2023(3):119-128. 被引量：4
10高哲,剪静.基于Borges差分的RMSprop算法及在卷积神经网络参数训练中的应用[J].辽宁大学学报（自然科学版）,2023,50(1):1-9.

计算机工程与应用

2023年第7期

浏览历史

内容加载中请稍等...

基于对手动作预测的智能博弈对抗算法被引量：1

参考文献9

二级参考文献35

共引文献108

同被引文献13

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于对手动作预测的智能博弈对抗算法 被引量：1

参考文献9

二级参考文献35

共引文献108

同被引文献13

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于对手动作预测的智能博弈对抗算法被引量：1