Reinforcement learning has been applied to air combat problems in recent years,and the idea of curriculum learning is often used for reinforcement learning,but traditional curriculum learning suffers from the problem ...Reinforcement learning has been applied to air combat problems in recent years,and the idea of curriculum learning is often used for reinforcement learning,but traditional curriculum learning suffers from the problem of plasticity loss in neural networks.Plasticity loss is the difficulty of learning new knowledge after the network has converged.To this end,we propose a motivational curriculum learning distributed proximal policy optimization(MCLDPPO)algorithm,through which trained agents can significantly outperform the predictive game tree and mainstream reinforcement learning methods.The motivational curriculum learning is designed to help the agent gradually improve its combat ability by observing the agent's unsatisfactory performance and providing appropriate rewards as a guide.Furthermore,a complete tactical maneuver is encapsulated based on the existing air combat knowledge,and through the flexible use of these maneuvers,some tactics beyond human knowledge can be realized.In addition,we designed an interruption mechanism for the agent to increase the frequency of decisionmaking when the agent faces an emergency.When the number of threats received by the agent changes,the current action is interrupted in order to reacquire observations and make decisions again.Using the interruption mechanism can significantly improve the performance of the agent.To simulate actual air combat better,we use digital twin technology to simulate real air battles and propose a parallel battlefield mechanism that can run multiple simulation environments simultaneously,effectively improving data throughput.The experimental results demonstrate that the agent can fully utilize the situational information to make reasonable decisions and provide tactical adaptation in the air combat,verifying the effectiveness of the algorithmic framework proposed in this paper.展开更多
Based on a panel dataset spanning from 2003 to 2019 and encompassing 284 prefecture-level cities in China,this study treats the implementation of the carbon emissions trading policy(CETP)as a quasi-natural experiment....Based on a panel dataset spanning from 2003 to 2019 and encompassing 284 prefecture-level cities in China,this study treats the implementation of the carbon emissions trading policy(CETP)as a quasi-natural experiment.In addition,it employs a spatial difference-in-differences(DID)framework to quantify both the direct and spatially mediated impacts of CETP on urban carbon emission efficiency(CEE).The investigation further delves into the underlying channels of influence and variations within this context.The findings demonstrate that CETP effectively enhances CEE within the cities chosen for piloting;however,it concurrently dampens CEE in nonpiloting neighboring cities.These conclusions remain robust across diverse sensitivity tests.The analysis of mechanisms reveals that CETP’s influence on urban CEE primarily operates through the avenues of technological innovation and optimization of energy structure.Moreover,the study of variances discloses that CETP’s direct effect significantly advances CEE in eastern,old industrial base,and central cities.In terms of indirect effects,a pronounced adverse spatial spillover effect is observed in eastern and old industrial base cities,while noteworthy positive spatial spillover effects emerge in central cities.Notably,the spatial extent of CETP’s influence on urban CEE declines after reaching a distance of 900 km.These insights furnish valuable guidance for China in refining its nationwide carbon market and expediting the shift toward a low-carbon economy.展开更多
Based on the similarity of separation time,a similarity law optimization method for high-speed weapon delivery test is derived.The typical separation state under wind load is simulated by the numerical method.The real...Based on the similarity of separation time,a similarity law optimization method for high-speed weapon delivery test is derived.The typical separation state under wind load is simulated by the numerical method.The real separation data of aircraft,separation data of previous test methods,separation data of ideal wind tunnel test of previous methods,and simulation data of the proposed optimization method are obtained.A comparison of the data shows that the method proposed can improve the performance of tracking.Similarity law optimization starts with the development of motion equations and dynamic equations in the windless state to address the problems of mismatching between vertical and horizontal displacement,and to address the problems of separation trajectory distortion caused by insufficient gravity acceleration of the scaling model of existing light model.The ejection velocity of the model is taken as a factor/vector,and is adjusted reasonably to compensate the linear displacement insufficiency caused by the insufficient vertical acceleration of the light model method,so as to ensure the matching of the vertical and horizontal displacement of the projectile,and to improve the consistency between the test results of high-speed projection and the actual separation trajectory.The optimized similarity law is applicable to many existing free-throwing modes of high-speed wind tunnels.The optimized similarity law is not affected by the ejection velocity and hanging mode of the projectile.The optimized similarity law is suitable not only for the launching of the buried ammunition compartment and external stores,but also for the test design of projectile launching and gravity separation.展开更多
文摘Reinforcement learning has been applied to air combat problems in recent years,and the idea of curriculum learning is often used for reinforcement learning,but traditional curriculum learning suffers from the problem of plasticity loss in neural networks.Plasticity loss is the difficulty of learning new knowledge after the network has converged.To this end,we propose a motivational curriculum learning distributed proximal policy optimization(MCLDPPO)algorithm,through which trained agents can significantly outperform the predictive game tree and mainstream reinforcement learning methods.The motivational curriculum learning is designed to help the agent gradually improve its combat ability by observing the agent's unsatisfactory performance and providing appropriate rewards as a guide.Furthermore,a complete tactical maneuver is encapsulated based on the existing air combat knowledge,and through the flexible use of these maneuvers,some tactics beyond human knowledge can be realized.In addition,we designed an interruption mechanism for the agent to increase the frequency of decisionmaking when the agent faces an emergency.When the number of threats received by the agent changes,the current action is interrupted in order to reacquire observations and make decisions again.Using the interruption mechanism can significantly improve the performance of the agent.To simulate actual air combat better,we use digital twin technology to simulate real air battles and propose a parallel battlefield mechanism that can run multiple simulation environments simultaneously,effectively improving data throughput.The experimental results demonstrate that the agent can fully utilize the situational information to make reasonable decisions and provide tactical adaptation in the air combat,verifying the effectiveness of the algorithmic framework proposed in this paper.
基金the financial support provided by the National Natural Science Foundation of China(Grant number.72373138 and number.71973131)Major Project of National Social Science Foundation of China(Grant number.19VHQ 002).
文摘Based on a panel dataset spanning from 2003 to 2019 and encompassing 284 prefecture-level cities in China,this study treats the implementation of the carbon emissions trading policy(CETP)as a quasi-natural experiment.In addition,it employs a spatial difference-in-differences(DID)framework to quantify both the direct and spatially mediated impacts of CETP on urban carbon emission efficiency(CEE).The investigation further delves into the underlying channels of influence and variations within this context.The findings demonstrate that CETP effectively enhances CEE within the cities chosen for piloting;however,it concurrently dampens CEE in nonpiloting neighboring cities.These conclusions remain robust across diverse sensitivity tests.The analysis of mechanisms reveals that CETP’s influence on urban CEE primarily operates through the avenues of technological innovation and optimization of energy structure.Moreover,the study of variances discloses that CETP’s direct effect significantly advances CEE in eastern,old industrial base,and central cities.In terms of indirect effects,a pronounced adverse spatial spillover effect is observed in eastern and old industrial base cities,while noteworthy positive spatial spillover effects emerge in central cities.Notably,the spatial extent of CETP’s influence on urban CEE declines after reaching a distance of 900 km.These insights furnish valuable guidance for China in refining its nationwide carbon market and expediting the shift toward a low-carbon economy.
基金supported by the Advanced Research Fund for Weapons and Equipment Development of China.
文摘Based on the similarity of separation time,a similarity law optimization method for high-speed weapon delivery test is derived.The typical separation state under wind load is simulated by the numerical method.The real separation data of aircraft,separation data of previous test methods,separation data of ideal wind tunnel test of previous methods,and simulation data of the proposed optimization method are obtained.A comparison of the data shows that the method proposed can improve the performance of tracking.Similarity law optimization starts with the development of motion equations and dynamic equations in the windless state to address the problems of mismatching between vertical and horizontal displacement,and to address the problems of separation trajectory distortion caused by insufficient gravity acceleration of the scaling model of existing light model.The ejection velocity of the model is taken as a factor/vector,and is adjusted reasonably to compensate the linear displacement insufficiency caused by the insufficient vertical acceleration of the light model method,so as to ensure the matching of the vertical and horizontal displacement of the projectile,and to improve the consistency between the test results of high-speed projection and the actual separation trajectory.The optimized similarity law is applicable to many existing free-throwing modes of high-speed wind tunnels.The optimized similarity law is not affected by the ejection velocity and hanging mode of the projectile.The optimized similarity law is suitable not only for the launching of the buried ammunition compartment and external stores,but also for the test design of projectile launching and gravity separation.