摘要
传统电子战正逐步向融合人工智能技术的智能电子战演变,基于强化学习的多无人机电子协同对抗为主要场景,针对复杂高维的状态动作空间下多智能体强化学习算法不容易收敛问题,提出了一种基于优先经验回放的多智能体双对抗策略梯度算法。该算法通过引入优先经验回放机制,并提出对抗Critic网络和双Critic网络来平衡动作及价值间的关系和减小单一Critic网络估计不确定性的问题。仿真实验结果表明:在同一仿真场景下相较于其他强化学习算法,PerMaD4算法具有更好的收敛效果且任务完成度提高了8.9%。
Traditional electronic warfare is gradually evolving into intelligent electronic warfare that integrates artificial intelligence technology.In view of the problem that multi-agent reinforcement learning algorithm is not easy to converge in complex and high-dimensional state action space,a multi-agent dual adversarial strategy gradient algorithm based on preferential experience playback is proposed.The algorithm introduces a preferential experience playback mechanism,and presents a counter Critic network and a dual Critic network to balance the relationship between action and value and to reduce the uncertainty of a single Critic network.The simulation results show that compared with other reinforcement learning algorithms,the PerMaD4 algorithm has better convergence effect and the task completion degree is increased by 8.9%in the same simulation scene.
作者
杨洋
王烨
康大勇
陈嘉玉
李姜
赵华栋
YANG Yang;WANG Ye;KANG Dayong;CHEN Jiayu;LI Jiang;ZHAO Huadong(Changchun Institute of Optics,Fine Mechanics and Physics,Chinese Academy of Sciences,Changchun 130033,China;University of Chinese Academy of Sciences,Beijing 100049,China;Key Laboratory of Electro-Optical Countermeasures Test&Evaluation Technology,Luoyang 471000,China)
出处
《兵器装备工程学报》
CAS
CSCD
北大核心
2024年第7期1-10,共10页
Journal of Ordnance Equipment Engineering
基金
国家自然科学基金项目(61977059)。
关键词
协同决策
强化学习
策略梯度
电子对抗仿真
collaborative decision-making
reinforcement learning
policy gradient
electronic countermeasure simulation