摘要
智能博弈对抗场景中,多智能体强化学习算法存在“非平稳性”问题,智能体的策略不仅取决于环境,还受到环境中对手(其他智能体)的影响。根据对手与环境的交互信息,预测其策略和意图,并以此调整智能体自身策略是缓解上述问题的有效方式。提出一种基于对手动作预测的智能博弈对抗算法,对环境中的对手进行隐式建模。该算法通过监督学习获得对手的策略特征,并将其与智能体的强化学习模型融合,缓解对手对学习稳定性的影响。在1v1足球环境中的仿真实验表明,提出的算法能够有效预测对手的动作,加快学习收敛速度,提升智能体的对抗水平。
In the intelligent game confrontation scenario,the multi-agent reinforcement learning algorithm has the problem of“non stationarity”.The policy of the agent depends not only on the environment,but also on opponent,other agents in the environment.According to the interaction information between the opponent and the environment,predicting its strategy and intention,and adjusting the agent’s own strategy is an effective way to alleviate the above problems.An intelligent game confrontation algorithm based on opponent action prediction is proposed to implicitly model the opponent in the environment.The algorithm obtains the opponent’s policy features through supervised learning,and integrates them with the agent’s reinforcement learning model to alleviate the influence of the opponent on learning stability.The simulation experiments in 1v1 soccer environment show that the proposed algorithm can effectively predict the opponent’s actions,accelerate the learning convergence speed and improve the confrontation level of agents.
作者
韩润海
陈浩
刘权
黄健
HAN Runhai;CHEN Hao;LIU Quan;HUANG Jian(College of Intelligent Science and Technology,National University of Defense Technology,Changsha 410073,China)
出处
《计算机工程与应用》
CSCD
北大核心
2023年第7期190-197,共8页
Computer Engineering and Applications
关键词
对手动作预测
竞争双深度Q网络(D3QN)
智能博弈对抗
深度强化学习
opponent action prediction
dueling double deep Q network(D3QN)
intelligent game confrontation
deep reinforcement learning