摘要
随着人工智能与无人机(UAV)技术的发展,近距空战智能决策得到了世界各国的广泛关注。针对传统强化学习在解决近距空战智能决策问题时存在过拟合与策略循环等问题,提出了一种基于种群博弈的空战智能决策模型训练范式。通过构建由多个无人机智能体组成的种群,并为每个智能体赋予不同奖励权重系数,实现了无人机智能体多样化的风险偏好。种群中不同风险偏好的智能体模型相互进行对抗训练,能够有效避免过拟合和策略循环问题。在训练过程中,每个无人机智能体根据与不同对手策略的对抗结果自适应地优化奖励权重系数。在数值仿真实验中,种群博弈训练中的智能体5与智能体3分别以88%和85%的胜率击败了专家系统对抗训练和自博弈训练得到的智能决策模型,算法性能得到有效验证。此外,通过进一步实验表明了种群博弈训练范式中权重系数动态调整的必要性,并在异构机型上验证了所提训练范式的通用性。
With the development of artificial intelligence and Unmanned Aerial Vehicle(UAv)technologies,intelligent decision-making in close-range air combat has attracted extensive attention from all over the world.To solve the problems of overfitting and strategy cycles in using traditional reinforcement learning for intelligent decision-making in closerange air combat,a training paradigm of air combat decision model is proposed based on population game.By constructing a population composed of multiple UAv agents and assigning different reward weight coefficients to each agent,the diversified risk preference of UAV agents is realized.The problem of overftting and strategy cycle can be avoided effectively by training agents of different risk preferences to fight against each other.During the training process,each UAV agent in the population adaptively optimizes the reward weight coefficient according to the results of the confrontation with different opponent strategies.In the numerical simulation experiment,Agent 5 and Agent 3 in population game training beat the inteligent decision model obtained by expert system adversarial training and self-play training with 88%and 85%success rate,respectively,which verifies the effectiveness of the algorithm.In addition,further experiments demonstrate the necessity of dynamic adjustment of weight coefficients in the training paradigm of population game,and verify the generality of the proposed training paradigm on heterogeneous models.
作者
王宝来
高显忠
谢涛
侯中喜
WANG Baolai;GAO Xianzhong;XIE Tao;HOU Zhongxi(College of Computer Science and Technology,National University of Defense Technology,Changsha410073,China;College of Aerospace Science and Engineering,National University of Defense Technology,Changsha 410073,China)
出处
《航空学报》
EI
CAS
CSCD
北大核心
2024年第12期169-184,共16页
Acta Aeronautica et Astronautica Sinica
基金
国家自然科学基金(61903369,11602298)
湖南省自然科学基金(2018JJ3587)。
关键词
近距空战
智能决策
强化学习
种群博弈
SAC算法
close-range air combat
intelligent decision-making
reinforcement learning
population game
SACalgorithm