摘要
多机协同是空中作战的关键环节,如何处理多实体间复杂的协作关系、实现多机协同空战的智能决策是亟待解决的问题.为此,提出基于深度强化学习的多机协同空战决策流程框架(Deep-reinforcement-learning-based multi-aircraft cooperative air combat decision framework,DRL-MACACDF),并针对近端策略优化(Proximal policy optimization,PPO)算法,设计4种算法增强机制,提高多机协同对抗场景下智能体间的协同程度.在兵棋推演平台上进行的仿真实验,验证了该方法的可行性和实用性,并对对抗过程数据进行了可解释性复盘分析,研讨了强化学习与传统兵棋推演结合的交叉研究方向.
Multi-aircraft cooperation is the key part of air combat,and how to deal with the complex cooperation relationship between multi-entities is the essential problem to be solved urgently.In order to solve the problem of intelligent decision-making in multi-aircraft cooperative air combat,a deep-reinforcement-learning-based multiaircraft cooperative air combat decision framework(DRL-MACACDF)is proposed in this paper.Based on proximal policy optimization(PPO),four algorithm enhancement mechanisms are designed to improve the synergistic degree of agents in multi-aircraft cooperative confrontation scenarios.The feasibility and practicability of the method are verified by the simulation on the wargame platform,and the interpretable review analysis of the antagonistic process data is carried out,and the cross research direction of the combination of reinforcement learning and traditional wargame deduction is discussed.
作者
施伟
冯旸赫
程光权
黄红蓝
黄金才
刘忠
贺威
SHI Wei;FENG Yang-He;CHENG Guang-Quan;HUANG Hong-Lan;HUANG Jin-Cai;LIU Zhong;HE Wei(College of Systems Engineering,National University of Defense Technology,Changsha 410073;Institute of Artificial Intelligence,University of Science and Technology Beijing,Beijing 100083;School of Automation and Electrical Engineering,University of Science and Technology Beijing,Beijing 100083)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2021年第7期1610-1623,共14页
Acta Automatica Sinica
基金
国家自然科学基金(71701205,62073333)资助。
关键词
多机协同空战
智能决策
深度强化学习
PPO算法
增强机制
Multi-aircraft cooperative air combat
intelligent decision
deep reinforcement learning
proximal policy optimization(PPO)algorithm
enhancement mechanism