Behavioral decision-making at urban intersections is one of the primary difficulties currently impeding the development of intelligent vehicle technology.The problem is that existing decision-making algorithms cannot ...Behavioral decision-making at urban intersections is one of the primary difficulties currently impeding the development of intelligent vehicle technology.The problem is that existing decision-making algorithms cannot effectively deal with complex random scenarios at urban intersections.To deal with this,a deep deterministic policy gradient(DDPG)decision-making algorithm(T-DDPG)based on a time-series Markov decision process(T-MDP)was developed,where the state was extended to collect observations from several consecutive frames.Experiments found that T-DDPG performed better in terms of convergence and generalizability in complex intersection scenarios than a traditional DDPG algorithm.Furthermore,model-agnostic meta-learning(MAML)was incorporated into the T-DDPG algorithm to improve the training method,leading to a decision algorithm(T-MAML-DDPG)based on a secondary gradient.Simulation experiments of intersection scenarios were carried out on the Gym-Carla platform to verify and compare the decision models.The results showed that T-MAML-DDPG was able to easily deal with the random states of complex intersection scenarios,which could improve traffic safety and efficiency.The above decision-making models based on meta-reinforcement learning are significant for enhancing the decision-making ability of intelligent vehicles at urban intersections.展开更多
TheUAV pursuit-evasion problem focuses on the efficient tracking and capture of evading targets using unmanned aerial vehicles(UAVs),which is pivotal in public safety applications,particularly in scenarios involving i...TheUAV pursuit-evasion problem focuses on the efficient tracking and capture of evading targets using unmanned aerial vehicles(UAVs),which is pivotal in public safety applications,particularly in scenarios involving intrusion monitoring and interception.To address the challenges of data acquisition,real-world deployment,and the limited intelligence of existing algorithms in UAV pursuit-evasion tasks,we propose an innovative swarm intelligencebased UAV pursuit-evasion control framework,namely“Boids Model-based DRL Approach for Pursuit and Escape”(Boids-PE),which synergizes the strengths of swarm intelligence from bio-inspired algorithms and deep reinforcement learning(DRL).The Boids model,which simulates collective behavior through three fundamental rules,separation,alignment,and cohesion,is adopted in our work.By integrating Boids model with the Apollonian Circles algorithm,significant improvements are achieved in capturing UAVs against simple evasion strategies.To further enhance decision-making precision,we incorporate a DRL algorithm to facilitate more accurate strategic planning.We also leverage self-play training to continuously optimize the performance of pursuit UAVs.During experimental evaluation,we meticulously designed both one-on-one and multi-to-one pursuit-evasion scenarios,customizing the state space,action space,and reward function models for each scenario.Extensive simulations,supported by the PyBullet physics engine,validate the effectiveness of our proposed method.The overall results demonstrate that Boids-PE significantly enhance the efficiency and reliability of UAV pursuit-evasion tasks,providing a practical and robust solution for the real-world application of UAV pursuit-evasion missions.展开更多
The cloud boundary network environment is characterized by a passive defense strategy,discrete defense actions,and delayed defense feedback in the face of network attacks,ignoring the influence of the external environ...The cloud boundary network environment is characterized by a passive defense strategy,discrete defense actions,and delayed defense feedback in the face of network attacks,ignoring the influence of the external environment on defense decisions,thus resulting in poor defense effectiveness.Therefore,this paper proposes a cloud boundary network active defense model and decision method based on the reinforcement learning of intelligent agent,designs the network structure of the intelligent agent attack and defense game,and depicts the attack and defense game process of cloud boundary network;constructs the observation space and action space of reinforcement learning of intelligent agent in the non-complete information environment,and portrays the interaction process between intelligent agent and environment;establishes the reward mechanism based on the attack and defense gain,and encourage intelligent agents to learn more effective defense strategies.the designed active defense decision intelligent agent based on deep reinforcement learning can solve the problems of border dynamics,interaction lag,and control dispersion in the defense decision process of cloud boundary networks,and improve the autonomy and continuity of defense decisions.展开更多
Through integrating advanced communication and data processing technologies into smart vehicles and roadside infrastructures,the Intelligent Transportation System(ITS)has evolved as a promising paradigm for improving ...Through integrating advanced communication and data processing technologies into smart vehicles and roadside infrastructures,the Intelligent Transportation System(ITS)has evolved as a promising paradigm for improving safety,efficiency of the transportation system.However,the strict delay requirement of the safety-related applications is still a great challenge for the ITS,especially in dense traffic environment.In this paper,we introduce the metric called Perception-Reaction Time(PRT),which reflects the time consumption of safety-related applications and is closely related to road efficiency and security.With the integration of the incorporating information-centric networking technology and the fog virtualization approach,we propose a novel fog resource scheduling mechanism to minimize the PRT.Furthermore,we adopt a deep reinforcement learning approach to design an on-line optimal resource allocation scheme.Numerical results demonstrate that our proposed schemes is able to reduce about 70%of the RPT compared with the traditional approach.展开更多
Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinfor...Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinforce-ment learning(DRL)theory and an improved Multi-Agent Deep Deterministic Policy Gradient(MADDPG-D2)algorithm with a dual experience replay pool and a dual noise based on multi-agent architecture is proposed to improve the efficiency of DTA.The algorithm is based on the traditional Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,and considers the introduction of a double noise mechanism to increase the action exploration space in the early stage of the algorithm,and the introduction of a double experience pool to improve the data utilization rate;at the same time,in order to accelerate the training speed and efficiency of the agents,and to solve the cold-start problem of the training,the a priori knowledge technology is applied to the training of the algorithm.Finally,the MADDPG-D2 algorithm is compared and analyzed based on the digital battlefield of ground and air confrontation.The experimental results show that the agents trained by the MADDPG-D2 algorithm have higher win rates and average rewards,can utilize the resources more reasonably,and better solve the problem of the traditional single agent algorithms facing the difficulty of solving the problem in the high-dimensional decision space.The MADDPG-D2 algorithm based on multi-agent architecture proposed in this paper has certain superiority and rationality in DTA.展开更多
As the pioneer in the intelligent construction technologies(ICT)of transportation infrastructure,intelligent compaction(IC)has been applied in the infrastructure construction of various countries.It is currently the t...As the pioneer in the intelligent construction technologies(ICT)of transportation infrastructure,intelligent compaction(IC)has been applied in the infrastructure construction of various countries.It is currently the technology that best reflects the intelligence of engineering construction.This article overviews the latest developments and trends in IC.Firstly,the basic meaning of ICT is defined based on the essential characteristics of intelligent construction of transportation infrastructure,“perception,analysis,decision-making,execution”(PADE).The concept of intelligent compaction technology classification is also introduced.The PADE requirements that intelligent compaction should meet are proposed.Secondly,according to the sequence of“perception,analysis,decision-making,execution,”the workflow and key technologies of intelligent compaction are analyzed,and the mechanism of using the response of the roller to solve the modulus is given and verified.On this basis,The IC feasibility test methods,including compaction degree,compaction stability,and compaction uniformity,are briefly described.The implementation scheme of feedback control is given.Then,the use of artificial neural networks(ANN),hybrid expert systems,and reinforcement learning in intelligent compaction are briefly introduced.Finally,several extended applications of intelligent compaction are expounded,including the development ideas of intelligent road rollers and the role of intelligent compaction in virtual construction,the layer-specific mechanical parameters of fillers,etc.展开更多
Aiming at the problems of traditional dynamic weapon-target assignment algorithms in command decisionmaking,such as large computational amount,slow solution speed,and low calculation accuracy,combined with deep reinfo...Aiming at the problems of traditional dynamic weapon-target assignment algorithms in command decisionmaking,such as large computational amount,slow solution speed,and low calculation accuracy,combined with deep reinforcement learning theory,an improved Deep Deterministic Policy Gradient algorithm with dual noise and prioritized experience replay is proposed,which uses a double noise mechanism to expand the search range of the action,and introduces a priority experience playback mechanism to effectively achieve data utilization.Finally,the algorithm is simulated and validated on the ground-to-air countermeasures digital battlefield.The results of the experiment show that,under the framework of the deep neural network for intelligent weapon-target assignment proposed in this paper,compared to the traditional RELU algorithm,the agent trained with reinforcement learning algorithms,such asDeepDeterministic Policy Gradient algorithm,Asynchronous Advantage Actor-Critic algorithm,Deep Q Network algorithm performs better.It shows that the use of deep reinforcement learning algorithms to solve the weapon-target assignment problem in the field of air defense operations is scientific.In contrast to other reinforcement learning algorithms,the agent trained by the improved Deep Deterministic Policy Gradient algorithm has a higher win rate and reward in confrontation,and the use of weapon resources is more efficient.It shows that the model and algorithm have certain superiority and rationality.The results of this paper provide new ideas for solving the problemof weapon-target assignment in air defense combat command decisions.展开更多
The anthropomorphic intelligence of autonomous driving has been a research hotspot in the world.However,current studies have not been able to reveal the mechanism of drivers'natural driving behaviors.Therefore,thi...The anthropomorphic intelligence of autonomous driving has been a research hotspot in the world.However,current studies have not been able to reveal the mechanism of drivers'natural driving behaviors.Therefore,this thesis starts from the perspective of cognitive decision-making in the human brain,which is inspired by the regulation of dopamine feedback in the basal ganglia,and a reinforcement learning model is established to solve the brain-like intelligent decision-making problems in the process of interacting with the environment.In this thesis,first,a detailed bionic mechanism architecture based on basal ganglia was proposed by the consideration and analysis of its feedback regulation mechanism;second,the above mechanism was transformed into a reinforcement Q-learning model,so as to implement the learning and adaptation abilities of an intelligent vehicle for brain-like intelligent decision-making during car-following;finally,the feasibility and effectiveness of the proposed method were verified by the simulations and real vehicle tests.展开更多
基金supported in part by the Beijing Municipal Science and Technology Project(No.Z191100007419010)Automobile Industry Joint Fund(No.U1764261)of the National Natural Science Foundation of China+1 种基金Shandong Key R&D Program(No.2020CXGC010118)Key Laboratory for New Technology Application of Road Conveyance of Jiangsu Province(No.BM20082061706)。
文摘Behavioral decision-making at urban intersections is one of the primary difficulties currently impeding the development of intelligent vehicle technology.The problem is that existing decision-making algorithms cannot effectively deal with complex random scenarios at urban intersections.To deal with this,a deep deterministic policy gradient(DDPG)decision-making algorithm(T-DDPG)based on a time-series Markov decision process(T-MDP)was developed,where the state was extended to collect observations from several consecutive frames.Experiments found that T-DDPG performed better in terms of convergence and generalizability in complex intersection scenarios than a traditional DDPG algorithm.Furthermore,model-agnostic meta-learning(MAML)was incorporated into the T-DDPG algorithm to improve the training method,leading to a decision algorithm(T-MAML-DDPG)based on a secondary gradient.Simulation experiments of intersection scenarios were carried out on the Gym-Carla platform to verify and compare the decision models.The results showed that T-MAML-DDPG was able to easily deal with the random states of complex intersection scenarios,which could improve traffic safety and efficiency.The above decision-making models based on meta-reinforcement learning are significant for enhancing the decision-making ability of intelligent vehicles at urban intersections.
文摘TheUAV pursuit-evasion problem focuses on the efficient tracking and capture of evading targets using unmanned aerial vehicles(UAVs),which is pivotal in public safety applications,particularly in scenarios involving intrusion monitoring and interception.To address the challenges of data acquisition,real-world deployment,and the limited intelligence of existing algorithms in UAV pursuit-evasion tasks,we propose an innovative swarm intelligencebased UAV pursuit-evasion control framework,namely“Boids Model-based DRL Approach for Pursuit and Escape”(Boids-PE),which synergizes the strengths of swarm intelligence from bio-inspired algorithms and deep reinforcement learning(DRL).The Boids model,which simulates collective behavior through three fundamental rules,separation,alignment,and cohesion,is adopted in our work.By integrating Boids model with the Apollonian Circles algorithm,significant improvements are achieved in capturing UAVs against simple evasion strategies.To further enhance decision-making precision,we incorporate a DRL algorithm to facilitate more accurate strategic planning.We also leverage self-play training to continuously optimize the performance of pursuit UAVs.During experimental evaluation,we meticulously designed both one-on-one and multi-to-one pursuit-evasion scenarios,customizing the state space,action space,and reward function models for each scenario.Extensive simulations,supported by the PyBullet physics engine,validate the effectiveness of our proposed method.The overall results demonstrate that Boids-PE significantly enhance the efficiency and reliability of UAV pursuit-evasion tasks,providing a practical and robust solution for the real-world application of UAV pursuit-evasion missions.
基金supported in part by the National Natural Science Foundation of China(62106053)the Guangxi Natural Science Foundation(2020GXNSFBA159042)+2 种基金Innovation Project of Guangxi Graduate Education(YCSW2023478)the Guangxi Education Department Program(2021KY0347)the Doctoral Fund of Guangxi University of Science and Technology(XiaoKe Bo19Z33)。
文摘The cloud boundary network environment is characterized by a passive defense strategy,discrete defense actions,and delayed defense feedback in the face of network attacks,ignoring the influence of the external environment on defense decisions,thus resulting in poor defense effectiveness.Therefore,this paper proposes a cloud boundary network active defense model and decision method based on the reinforcement learning of intelligent agent,designs the network structure of the intelligent agent attack and defense game,and depicts the attack and defense game process of cloud boundary network;constructs the observation space and action space of reinforcement learning of intelligent agent in the non-complete information environment,and portrays the interaction process between intelligent agent and environment;establishes the reward mechanism based on the attack and defense gain,and encourage intelligent agents to learn more effective defense strategies.the designed active defense decision intelligent agent based on deep reinforcement learning can solve the problems of border dynamics,interaction lag,and control dispersion in the defense decision process of cloud boundary networks,and improve the autonomy and continuity of defense decisions.
基金supported by National Key R&D Program of China(No.2018YFE010267)the Science and Technology Program of Sichuan Province,China(No.2019YFH0007)+2 种基金the National Natural Science Foundation of China(No.61601083)the Xi’an Key Laboratory of Mobile Edge Computing and Security(No.201805052-ZD-3CG36)the EU H2020 Project COSAFE(MSCA-RISE-2018-824019)
文摘Through integrating advanced communication and data processing technologies into smart vehicles and roadside infrastructures,the Intelligent Transportation System(ITS)has evolved as a promising paradigm for improving safety,efficiency of the transportation system.However,the strict delay requirement of the safety-related applications is still a great challenge for the ITS,especially in dense traffic environment.In this paper,we introduce the metric called Perception-Reaction Time(PRT),which reflects the time consumption of safety-related applications and is closely related to road efficiency and security.With the integration of the incorporating information-centric networking technology and the fog virtualization approach,we propose a novel fog resource scheduling mechanism to minimize the PRT.Furthermore,we adopt a deep reinforcement learning approach to design an on-line optimal resource allocation scheme.Numerical results demonstrate that our proposed schemes is able to reduce about 70%of the RPT compared with the traditional approach.
基金This research was funded by the Project of the National Natural Science Foundation of China,Grant Number 62106283.
文摘Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinforce-ment learning(DRL)theory and an improved Multi-Agent Deep Deterministic Policy Gradient(MADDPG-D2)algorithm with a dual experience replay pool and a dual noise based on multi-agent architecture is proposed to improve the efficiency of DTA.The algorithm is based on the traditional Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,and considers the introduction of a double noise mechanism to increase the action exploration space in the early stage of the algorithm,and the introduction of a double experience pool to improve the data utilization rate;at the same time,in order to accelerate the training speed and efficiency of the agents,and to solve the cold-start problem of the training,the a priori knowledge technology is applied to the training of the algorithm.Finally,the MADDPG-D2 algorithm is compared and analyzed based on the digital battlefield of ground and air confrontation.The experimental results show that the agents trained by the MADDPG-D2 algorithm have higher win rates and average rewards,can utilize the resources more reasonably,and better solve the problem of the traditional single agent algorithms facing the difficulty of solving the problem in the high-dimensional decision space.The MADDPG-D2 algorithm based on multi-agent architecture proposed in this paper has certain superiority and rationality in DTA.
文摘As the pioneer in the intelligent construction technologies(ICT)of transportation infrastructure,intelligent compaction(IC)has been applied in the infrastructure construction of various countries.It is currently the technology that best reflects the intelligence of engineering construction.This article overviews the latest developments and trends in IC.Firstly,the basic meaning of ICT is defined based on the essential characteristics of intelligent construction of transportation infrastructure,“perception,analysis,decision-making,execution”(PADE).The concept of intelligent compaction technology classification is also introduced.The PADE requirements that intelligent compaction should meet are proposed.Secondly,according to the sequence of“perception,analysis,decision-making,execution,”the workflow and key technologies of intelligent compaction are analyzed,and the mechanism of using the response of the roller to solve the modulus is given and verified.On this basis,The IC feasibility test methods,including compaction degree,compaction stability,and compaction uniformity,are briefly described.The implementation scheme of feedback control is given.Then,the use of artificial neural networks(ANN),hybrid expert systems,and reinforcement learning in intelligent compaction are briefly introduced.Finally,several extended applications of intelligent compaction are expounded,including the development ideas of intelligent road rollers and the role of intelligent compaction in virtual construction,the layer-specific mechanical parameters of fillers,etc.
基金funded by the Project of the National Natural Science Foundation of China,Grant Number 62106283.
文摘Aiming at the problems of traditional dynamic weapon-target assignment algorithms in command decisionmaking,such as large computational amount,slow solution speed,and low calculation accuracy,combined with deep reinforcement learning theory,an improved Deep Deterministic Policy Gradient algorithm with dual noise and prioritized experience replay is proposed,which uses a double noise mechanism to expand the search range of the action,and introduces a priority experience playback mechanism to effectively achieve data utilization.Finally,the algorithm is simulated and validated on the ground-to-air countermeasures digital battlefield.The results of the experiment show that,under the framework of the deep neural network for intelligent weapon-target assignment proposed in this paper,compared to the traditional RELU algorithm,the agent trained with reinforcement learning algorithms,such asDeepDeterministic Policy Gradient algorithm,Asynchronous Advantage Actor-Critic algorithm,Deep Q Network algorithm performs better.It shows that the use of deep reinforcement learning algorithms to solve the weapon-target assignment problem in the field of air defense operations is scientific.In contrast to other reinforcement learning algorithms,the agent trained by the improved Deep Deterministic Policy Gradient algorithm has a higher win rate and reward in confrontation,and the use of weapon resources is more efficient.It shows that the model and algorithm have certain superiority and rationality.The results of this paper provide new ideas for solving the problemof weapon-target assignment in air defense combat command decisions.
基金supported by the National Key Research and Development Program of China(2017YFB0102601)the National Science Foundation of China(51775236).
文摘The anthropomorphic intelligence of autonomous driving has been a research hotspot in the world.However,current studies have not been able to reveal the mechanism of drivers'natural driving behaviors.Therefore,this thesis starts from the perspective of cognitive decision-making in the human brain,which is inspired by the regulation of dopamine feedback in the basal ganglia,and a reinforcement learning model is established to solve the brain-like intelligent decision-making problems in the process of interacting with the environment.In this thesis,first,a detailed bionic mechanism architecture based on basal ganglia was proposed by the consideration and analysis of its feedback regulation mechanism;second,the above mechanism was transformed into a reinforcement Q-learning model,so as to implement the learning and adaptation abilities of an intelligent vehicle for brain-like intelligent decision-making during car-following;finally,the feasibility and effectiveness of the proposed method were verified by the simulations and real vehicle tests.