This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u...This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.展开更多
Pure proportional navigation(PPN) is suitable for endoatmospheric interceptions,for its commanded acceleration is perpendicular to interceptor velocity.However,if the target is much faster than the interceptor,the hom...Pure proportional navigation(PPN) is suitable for endoatmospheric interceptions,for its commanded acceleration is perpendicular to interceptor velocity.However,if the target is much faster than the interceptor,the homing performance of PPN will be degraded badly.True proportional navigation(TPN) does not have this problem,but its commanded acceleration is perpendicular to the line of sight(LOS),which is not suitable for endoatmospheric interceptions.The commanded acceleration of differential geometric guidance commands(DGGC) is perpendicular to the interceptor velocity,while the homing performance approximates the LOS referenced guidance laws(PPN series).Therefore,DGGC is suitable for endoatmospheric interception of high-speed targets.However,target maneuver information is essential for the construction of DGGC,and the guidance commands are complex and may be without robustness.Through the deep analysis of three-dimensional engagement,a new construction method of DGGC is proposed in this paper.The target maneuver information is not needed any more,and the robustness of DGGC is guaranteed,which makes the application of DGGC possible.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.12072090)。
文摘This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.
文摘Pure proportional navigation(PPN) is suitable for endoatmospheric interceptions,for its commanded acceleration is perpendicular to interceptor velocity.However,if the target is much faster than the interceptor,the homing performance of PPN will be degraded badly.True proportional navigation(TPN) does not have this problem,but its commanded acceleration is perpendicular to the line of sight(LOS),which is not suitable for endoatmospheric interceptions.The commanded acceleration of differential geometric guidance commands(DGGC) is perpendicular to the interceptor velocity,while the homing performance approximates the LOS referenced guidance laws(PPN series).Therefore,DGGC is suitable for endoatmospheric interception of high-speed targets.However,target maneuver information is essential for the construction of DGGC,and the guidance commands are complex and may be without robustness.Through the deep analysis of three-dimensional engagement,a new construction method of DGGC is proposed in this paper.The target maneuver information is not needed any more,and the robustness of DGGC is guaranteed,which makes the application of DGGC possible.