This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u...This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.展开更多
We consider the problem of variable selection for the fixed effects varying coefficient models. A variable selection procedure is developed using basis function approximations and group nonconcave penalized functions,...We consider the problem of variable selection for the fixed effects varying coefficient models. A variable selection procedure is developed using basis function approximations and group nonconcave penalized functions, and the fixed effects are removed using the proper weight matrices. The proposed procedure simultaneously removes the fixed individual effects, selects the significant variables and estimates the nonzero coefficient functions. With appropriate selection of the tuning parameters, an asymptotic theory for the resulting estimates is established under suitable conditions. Simulation studies are carried out to assess the performance of our proposed method, and a real data set is analyzed for further illustration.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.12072090)。
文摘This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.
基金Supported by National Natural Science Foundation of China(Grant Nos.11471029,11101014 and 11301279)the Beijing Natural Science Foundation(Grant No.1142002+3 种基金the Science and Technology Project of Beijing Municipal Education Commission(Grant No.KM201410005010)the Natural Science Foundation of the Jiangsu Higher Education Institutions of China(Grant No.12KJB110016)CERG Grant from the Hong Kong Research Grants Council(Grant No.HKBU 202012)FRG Grant from Hong Kong Baptist University(Grant No.FRG2/12-13/077)
文摘We consider the problem of variable selection for the fixed effects varying coefficient models. A variable selection procedure is developed using basis function approximations and group nonconcave penalized functions, and the fixed effects are removed using the proper weight matrices. The proposed procedure simultaneously removes the fixed individual effects, selects the significant variables and estimates the nonzero coefficient functions. With appropriate selection of the tuning parameters, an asymptotic theory for the resulting estimates is established under suitable conditions. Simulation studies are carried out to assess the performance of our proposed method, and a real data set is analyzed for further illustration.