This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u...This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.展开更多
Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-f...Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.展开更多
Taking the discourse learning of the new senior high school English textbook published by the People’s Education Press as an example,combined with the“six-dimensional guidance”deep reading strategy,and through the ...Taking the discourse learning of the new senior high school English textbook published by the People’s Education Press as an example,combined with the“six-dimensional guidance”deep reading strategy,and through the six-skill training strategies of“memory skill training,understanding skill training,application skill training,analytical skill training,evaluation skill training,creative skill training,”this paper aims to cultivate students’thinking profundity,logic,flexibility,sensitivity,criticality,and originality.It also promotes the real implementation of senior high school English deep reading that points to the cultivation of thinking quality in classroom teaching,and realizes the transformation from“conventional reading”to“deep reading”that reflects the core literacy of the discipline.展开更多
This paper presents an Iterative Learning Control design applied to homing guidance of missiles against maneuvering targets. According to numerical experiments, although an increase of the control energies is apprecia...This paper presents an Iterative Learning Control design applied to homing guidance of missiles against maneuvering targets. According to numerical experiments, although an increase of the control energies is appreciated with respect to a previous published base controller for comparison, this strategy, which is simple to realize, is able to reduce the time to reach the head-on condition to target destruction. This fact is important to minimize the missile lateral force-level to fulfill engaging in hyper-sonic target persecutions.展开更多
The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchic...The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively.展开更多
An impact point prediction(IPP) guidance based on supervised learning is proposed to address the problem of precise guidance for the ballistic missile in high maneuver penetration condition.An accurate ballistic traje...An impact point prediction(IPP) guidance based on supervised learning is proposed to address the problem of precise guidance for the ballistic missile in high maneuver penetration condition.An accurate ballistic trajectory model is applied to generate training samples,and ablation experiments are conducted to determine the mapping relationship between the flight state and the impact point.At the same time,the impact point coordinates are decoupled to improve the prediction accuracy,and the sigmoid activation function is improved to ameliorate the prediction efficiency.Therefore,an IPP neural network model,which solves the contradiction between the accuracy and the speed of the IPP,is established.In view of the performance deviation of the divert control system,the mapping relationship between the guidance parameters and the impact deviation is analysed based on the variational principle.In addition,a fast iterative model of guidance parameters is designed for reference to the Newton iteration method,which solves the nonlinear strong coupling problem of the guidance parameter solution.Monte Carlo simulation results show that the prediction accuracy of the impact point is high,with a 3 σ prediction error of 4.5 m,and the guidance method is robust,with a 3 σ error of 7.5 m.On the STM32F407 singlechip microcomputer,a single IPP takes about 2.374 ms,and a single guidance solution takes about9.936 ms,which has a good real-time performance and a certain engineering application value.展开更多
Diversified traffic participants and complex traffic environment(e.g.,roadblocks or road damage exist)challenge the decision-making accuracy of a single connected and autonomous vehicle(CAV)due to its limited sensing ...Diversified traffic participants and complex traffic environment(e.g.,roadblocks or road damage exist)challenge the decision-making accuracy of a single connected and autonomous vehicle(CAV)due to its limited sensing and computing capabilities.Using Internet of Vehicles(IoV)to share driving rules between CAVs can break limitations of a single CAV,but at the same time may cause privacy and safety issues.To tackle this problem,this paper proposes to combine IoV and blockchain technologies to form an efficient and accurate autonomous guidance strategy.Specifically,we first use reinforcement learning for driving decision learning,and give the corresponding driving rule extraction method.Then,an architecture combining IoV and blockchain is designed to ensure secure driving rule sharing.Finally,the shared rules will form an effective autonomous driving guidance strategy through driving rules selection and action selection.Extensive simulation proves that the proposed strategy performs well in complex traffic environment,mainly in terms of accuracy,safety,and robustness.展开更多
The authenticity identification of anti-counterfeiting codes based on mobile phone platforms is affected by lighting environment,photographing habits,camera resolution and other factors,resulting in poor collection qu...The authenticity identification of anti-counterfeiting codes based on mobile phone platforms is affected by lighting environment,photographing habits,camera resolution and other factors,resulting in poor collection quality of anti-counterfeiting codes and weak differentiation of anti-counterfeiting codes for high-quality counterfeits.Developing an anticounterfeiting code authentication algorithm based on mobile phones is of great commercial value.Although the existing algorithms developed based on special equipment can effectively identify forged anti-counterfeiting codes,the anti-counterfeiting code identification scheme based on mobile phones is still in its infancy.To address the small differences in texture features,low response speed and excessively large deep learning models used in mobile phone anti-counterfeiting and identification scenarios,we propose a feature-guided double pool attention network(FG-DPANet)to solve the reprinting forgery problem of printing anti-counterfeiting codes.To address the slight differences in texture features in high-quality reprinted anti-counterfeiting codes,we propose a feature guidance algorithm that creatively combines the texture features and the inherent noise feature of the scanner and printer introduced in the reprinting process to identify anti-counterfeiting code authenticity.The introduction of noise features effectively makes up for the small texture difference of high-quality anti-counterfeiting codes.The double pool attention network(DPANet)is a lightweight double pool attention residual network.Under the condition of ensuring detection accuracy,DPANet can simplify the network structure as much as possible,improve the network reasoning speed,and run better on mobile devices with low computing power.We conducted a series of experiments to evaluate the FG-DPANet proposed in this paper.Experimental results show that the proposed FG-DPANet can resist highquality and small-size anti-counterfeiting code reprint forgery.By comparing with the existing algorithm based on texture,it is shown that the proposed method has a higher authentication accuracy.Last but not least,the proposed scheme has been evaluated in the anti-counterfeiting code blurring scene,and the results show that our proposed method can well resist slight blurring of anti-counterfeiting images.展开更多
This paper proposes a novel approach for physical human-robot interactions(pHRI), where a robot provides guidance forces to a user based on the user performance. This framework tunes the forces in regards to behavior ...This paper proposes a novel approach for physical human-robot interactions(pHRI), where a robot provides guidance forces to a user based on the user performance. This framework tunes the forces in regards to behavior of each user in coping with different tasks, where lower performance results in higher intervention from the robot. This personalized physical human-robot interaction(p2HRI) method incorporates adaptive modeling of the interaction between the human and the robot as well as learning from demonstration(LfD) techniques to adapt to the users' performance. This approach is based on model predictive control where the system optimizes the rendered forces by predicting the performance of the user. Moreover, continuous learning of the user behavior is added so that the models and personalized considerations are updated based on the change of user performance over time. Applying this framework to a field such as haptic guidance for skill improvement, allows a more personalized learning experience where the interaction between the robot as the intelligent tutor and the student as the user,is better adjusted based on the skill level of the individual and their gradual improvement. The results suggest that the precision of the model of the interaction is improved using this proposed method,and the addition of the considered personalized factors to a more adaptive strategy for rendering of guidance forces.展开更多
In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue...In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue that the missile cannot accurately distinguish the target from the decoy,the energy center method is employed to obtain the equivalent energy center(called virtual target)of the target and decoy,and the model for the missile and the virtual decoy is established.Then,an improved DDPG algorithm is proposed based on a trusted-search strategy,which significantly increases the train efficiency of the previous DDPG algorithm.Furthermore,combining the established model,the network obtained by the improved DDPG algorithm and the reward function,an intelligent missile terminal guidance scheme is proposed.Specifically,a heuristic reward function is designed for training and learning in combat scenarios.Finally,the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests,and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.展开更多
基金supported by the National Natural Science Foundation of China(Grant No.12072090)。
文摘This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.
文摘Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.
文摘Taking the discourse learning of the new senior high school English textbook published by the People’s Education Press as an example,combined with the“six-dimensional guidance”deep reading strategy,and through the six-skill training strategies of“memory skill training,understanding skill training,application skill training,analytical skill training,evaluation skill training,creative skill training,”this paper aims to cultivate students’thinking profundity,logic,flexibility,sensitivity,criticality,and originality.It also promotes the real implementation of senior high school English deep reading that points to the cultivation of thinking quality in classroom teaching,and realizes the transformation from“conventional reading”to“deep reading”that reflects the core literacy of the discipline.
基金partially supported by the Spanish Ministry of Economy and Competitiveness under grant number DPI2015-64170-R(MINECO/FEDER)
文摘This paper presents an Iterative Learning Control design applied to homing guidance of missiles against maneuvering targets. According to numerical experiments, although an increase of the control energies is appreciated with respect to a previous published base controller for comparison, this strategy, which is simple to realize, is able to reduce the time to reach the head-on condition to target destruction. This fact is important to minimize the missile lateral force-level to fulfill engaging in hyper-sonic target persecutions.
基金supported by the National Natural Science Foundation of China(62003021,91212304).
文摘The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively.
基金supported by the National Natural Science Foundation of China (Grant No.62103432)supported by Young Talent fund of University Association for Science and Technology in Shaanxi, China(Grant No.20210108)。
文摘An impact point prediction(IPP) guidance based on supervised learning is proposed to address the problem of precise guidance for the ballistic missile in high maneuver penetration condition.An accurate ballistic trajectory model is applied to generate training samples,and ablation experiments are conducted to determine the mapping relationship between the flight state and the impact point.At the same time,the impact point coordinates are decoupled to improve the prediction accuracy,and the sigmoid activation function is improved to ameliorate the prediction efficiency.Therefore,an IPP neural network model,which solves the contradiction between the accuracy and the speed of the IPP,is established.In view of the performance deviation of the divert control system,the mapping relationship between the guidance parameters and the impact deviation is analysed based on the variational principle.In addition,a fast iterative model of guidance parameters is designed for reference to the Newton iteration method,which solves the nonlinear strong coupling problem of the guidance parameter solution.Monte Carlo simulation results show that the prediction accuracy of the impact point is high,with a 3 σ prediction error of 4.5 m,and the guidance method is robust,with a 3 σ error of 7.5 m.On the STM32F407 singlechip microcomputer,a single IPP takes about 2.374 ms,and a single guidance solution takes about9.936 ms,which has a good real-time performance and a certain engineering application value.
基金supported by the National Natural Science Foundation of China(62231020,62101401)the Fundamental Research Funds for the Central Universities(ZYTS23178)the Youth Innovation Team of Shaanxi Universities。
文摘Diversified traffic participants and complex traffic environment(e.g.,roadblocks or road damage exist)challenge the decision-making accuracy of a single connected and autonomous vehicle(CAV)due to its limited sensing and computing capabilities.Using Internet of Vehicles(IoV)to share driving rules between CAVs can break limitations of a single CAV,but at the same time may cause privacy and safety issues.To tackle this problem,this paper proposes to combine IoV and blockchain technologies to form an efficient and accurate autonomous guidance strategy.Specifically,we first use reinforcement learning for driving decision learning,and give the corresponding driving rule extraction method.Then,an architecture combining IoV and blockchain is designed to ensure secure driving rule sharing.Finally,the shared rules will form an effective autonomous driving guidance strategy through driving rules selection and action selection.Extensive simulation proves that the proposed strategy performs well in complex traffic environment,mainly in terms of accuracy,safety,and robustness.
基金This work is supported by Supported by the National Key Research and Development Program of China under Grant No.2020YFF0304902the Science and Technology Research Project of Jiangxi Provincial Department of Education under Grant No.GJJ202511。
文摘The authenticity identification of anti-counterfeiting codes based on mobile phone platforms is affected by lighting environment,photographing habits,camera resolution and other factors,resulting in poor collection quality of anti-counterfeiting codes and weak differentiation of anti-counterfeiting codes for high-quality counterfeits.Developing an anticounterfeiting code authentication algorithm based on mobile phones is of great commercial value.Although the existing algorithms developed based on special equipment can effectively identify forged anti-counterfeiting codes,the anti-counterfeiting code identification scheme based on mobile phones is still in its infancy.To address the small differences in texture features,low response speed and excessively large deep learning models used in mobile phone anti-counterfeiting and identification scenarios,we propose a feature-guided double pool attention network(FG-DPANet)to solve the reprinting forgery problem of printing anti-counterfeiting codes.To address the slight differences in texture features in high-quality reprinted anti-counterfeiting codes,we propose a feature guidance algorithm that creatively combines the texture features and the inherent noise feature of the scanner and printer introduced in the reprinting process to identify anti-counterfeiting code authenticity.The introduction of noise features effectively makes up for the small texture difference of high-quality anti-counterfeiting codes.The double pool attention network(DPANet)is a lightweight double pool attention residual network.Under the condition of ensuring detection accuracy,DPANet can simplify the network structure as much as possible,improve the network reasoning speed,and run better on mobile devices with low computing power.We conducted a series of experiments to evaluate the FG-DPANet proposed in this paper.Experimental results show that the proposed FG-DPANet can resist highquality and small-size anti-counterfeiting code reprint forgery.By comparing with the existing algorithm based on texture,it is shown that the proposed method has a higher authentication accuracy.Last but not least,the proposed scheme has been evaluated in the anti-counterfeiting code blurring scene,and the results show that our proposed method can well resist slight blurring of anti-counterfeiting images.
文摘This paper proposes a novel approach for physical human-robot interactions(pHRI), where a robot provides guidance forces to a user based on the user performance. This framework tunes the forces in regards to behavior of each user in coping with different tasks, where lower performance results in higher intervention from the robot. This personalized physical human-robot interaction(p2HRI) method incorporates adaptive modeling of the interaction between the human and the robot as well as learning from demonstration(LfD) techniques to adapt to the users' performance. This approach is based on model predictive control where the system optimizes the rendered forces by predicting the performance of the user. Moreover, continuous learning of the user behavior is added so that the models and personalized considerations are updated based on the change of user performance over time. Applying this framework to a field such as haptic guidance for skill improvement, allows a more personalized learning experience where the interaction between the robot as the intelligent tutor and the student as the user,is better adjusted based on the skill level of the individual and their gradual improvement. The results suggest that the precision of the model of the interaction is improved using this proposed method,and the addition of the considered personalized factors to a more adaptive strategy for rendering of guidance forces.
基金supported by the National Natural Science Foundation of China(Nos.61973253 and 62006192)。
文摘In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue that the missile cannot accurately distinguish the target from the decoy,the energy center method is employed to obtain the equivalent energy center(called virtual target)of the target and decoy,and the model for the missile and the virtual decoy is established.Then,an improved DDPG algorithm is proposed based on a trusted-search strategy,which significantly increases the train efficiency of the previous DDPG algorithm.Furthermore,combining the established model,the network obtained by the improved DDPG algorithm and the reward function,an intelligent missile terminal guidance scheme is proposed.Specifically,a heuristic reward function is designed for training and learning in combat scenarios.Finally,the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests,and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.