This paper proposes the nonlinear direct data-driven control from theoretical analysis and practical engineering,i.e.,unmanned aerial vehicle(UAV)formation flight system.Firstly,from the theoretical point of view,cons...This paper proposes the nonlinear direct data-driven control from theoretical analysis and practical engineering,i.e.,unmanned aerial vehicle(UAV)formation flight system.Firstly,from the theoretical point of view,consider one nonlinear closedloop system with a nonlinear plant and nonlinear feed-forward controller simultaneously.To avoid the complex identification process for that nonlinear plant,a nonlinear direct data-driven control strategy is proposed to design that nonlinear feed-forward controller only through the input-output measured data sequence directly,whose detailed explicit forms are model inverse method and approximated analysis method.Secondly,from the practical point of view,after reviewing the UAV formation flight system,nonlinear direct data-driven control is applied in designing the formation controller,so that the followers can track the leader’s desired trajectory during one small time instant only through solving one data fitting problem.Since most natural phenomena have nonlinear properties,the direct method must be the better one.Corresponding system identification and control algorithms are required to be proposed for those nonlinear systems,and the direct nonlinear controller design is the purpose of this paper.展开更多
Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance ...Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance in cooperative decision-making,it is challenging for existing MARL algorithms to quickly converge to an optimal strategy for UCAV formation in BVR air combat where confrontation is complicated and reward is extremely sparse and delayed.Aiming to solve this problem,this paper proposes an Advantage Highlight Multi-Agent Proximal Policy Optimization(AHMAPPO)algorithm.First,at every step,the AHMAPPO records the degree to which the best formation exceeds the average of formations in parallel environments and carries out additional advantage sampling according to it.Then,the sampling result is introduced into the updating process of the actor network to improve its optimization efficiency.Finally,the simulation results reveal that compared with some state-of-the-art MARL algorithms,the AHMAPPO can obtain a more excellent strategy utilizing fewer sample episodes in the UCAV formation BVR air combat simulation environment built in this paper,which can reflect the critical features of BVR air combat.The AHMAPPO can significantly increase the convergence efficiency of the strategy for UCAV formation in BVR air combat,with a maximum increase of 81.5%relative to other algorithms.展开更多
基金Natural Science Basic Research Plan in Shaanxi Province of China(2023-JC-QN-0733).
文摘This paper proposes the nonlinear direct data-driven control from theoretical analysis and practical engineering,i.e.,unmanned aerial vehicle(UAV)formation flight system.Firstly,from the theoretical point of view,consider one nonlinear closedloop system with a nonlinear plant and nonlinear feed-forward controller simultaneously.To avoid the complex identification process for that nonlinear plant,a nonlinear direct data-driven control strategy is proposed to design that nonlinear feed-forward controller only through the input-output measured data sequence directly,whose detailed explicit forms are model inverse method and approximated analysis method.Secondly,from the practical point of view,after reviewing the UAV formation flight system,nonlinear direct data-driven control is applied in designing the formation controller,so that the followers can track the leader’s desired trajectory during one small time instant only through solving one data fitting problem.Since most natural phenomena have nonlinear properties,the direct method must be the better one.Corresponding system identification and control algorithms are required to be proposed for those nonlinear systems,and the direct nonlinear controller design is the purpose of this paper.
基金co-supported by the National Natural Science Foundation of China(No.52272382)the Aeronautical Science Foundation of China(No.20200017051001)the Fundamental Research Funds for the Central Universities,China.
文摘Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance in cooperative decision-making,it is challenging for existing MARL algorithms to quickly converge to an optimal strategy for UCAV formation in BVR air combat where confrontation is complicated and reward is extremely sparse and delayed.Aiming to solve this problem,this paper proposes an Advantage Highlight Multi-Agent Proximal Policy Optimization(AHMAPPO)algorithm.First,at every step,the AHMAPPO records the degree to which the best formation exceeds the average of formations in parallel environments and carries out additional advantage sampling according to it.Then,the sampling result is introduced into the updating process of the actor network to improve its optimization efficiency.Finally,the simulation results reveal that compared with some state-of-the-art MARL algorithms,the AHMAPPO can obtain a more excellent strategy utilizing fewer sample episodes in the UCAV formation BVR air combat simulation environment built in this paper,which can reflect the critical features of BVR air combat.The AHMAPPO can significantly increase the convergence efficiency of the strategy for UCAV formation in BVR air combat,with a maximum increase of 81.5%relative to other algorithms.