Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonom...Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data.展开更多
Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains ...Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO.展开更多
The path planning of Unmanned Aerial Vehicle(UAV)is a critical issue in emergency communication and rescue operations,especially in adversarial urban environments.Due to the continuity of the flying space,complex buil...The path planning of Unmanned Aerial Vehicle(UAV)is a critical issue in emergency communication and rescue operations,especially in adversarial urban environments.Due to the continuity of the flying space,complex building obstacles,and the aircraft's high dynamics,traditional algorithms cannot find the optimal collision-free flying path between the UAV station and the destination.Accordingly,in this paper,we study the fast UAV path planning problem in a 3D urban environment from a source point to a target point and propose a Three-Step Experience Buffer Deep Deterministic Policy Gradient(TSEB-DDPG)algorithm.We first build the 3D model of a complex urban environment with buildings and project the 3D building surface into many 2D geometric shapes.After transformation,we propose the Hierarchical Learning Particle Swarm Optimization(HL-PSO)to obtain the empirical path.Then,to ensure the accuracy of the obtained paths,the empirical path,the collision information and fast transition information are stored in the three experience buffers of the TSEB-DDPG algorithm as dynamic guidance information.The sampling ratio of each buffer is dynamically adapted to the training stages.Moreover,we designed a reward mechanism to improve the convergence speed of the DDPG algorithm for UAV path planning.The proposed TSEB-DDPG algorithm has also been compared to three widely used competitors experimentally,and the results show that the TSEB-DDPG algorithm can archive the fastest convergence speed and the highest accuracy.We also conduct experiments in real scenarios and compare the real path planning obtained by the HL-PSO algorithm,DDPG algorithm,and TSEB-DDPG algorithm.The results show that the TSEBDDPG algorithm can archive almost the best in terms of accuracy,the average time of actual path planning,and the success rate.展开更多
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ...The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.展开更多
Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interferenc...Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interference caused by the line-of-sight(LoS)airto-ground channels,we deploy a reconfigurable intelligent surface(RIS)to rebuild the wireless channels.A joint optimization problem of the transmit power of UAV,the transmit power of D2D users and the RIS phase configuration are investigated to maximize the achievable rate of D2D users while satisfying the quality of service(QoS)requirement of cellular users.Due to the high channel dynamics and the coupling among cellular users,the RIS,and the D2D users,it is challenging to find a proper solution.Thus,a RIS softmax deep double deterministic(RIS-SD3)policy gradient method is proposed,which can smooth the optimization space as well as reduce the number of local optimizations.Specifically,the SD3 algorithm maximizes the reward of the agent by training the agent to maximize the value function after the softmax operator is introduced.Simulation results show that the proposed RIS-SD3 algorithm can significantly improve the rate of the D2D users while controlling the interference to the cellular user.Moreover,the proposed RIS-SD3 algorithm has better robustness than the twin delayed deep deterministic(TD3)policy gradient algorithm in a dynamic environment.展开更多
The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies wh...The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies while increasing the tactical capabilities of combat aircraft. As a research object, common UCAV uses the neural network fitting strategy to obtain values of attack areas. However, this simple strategy cannot cope with complex environmental changes and autonomously optimize decision-making problems. To solve the problem, this paper proposes a new deep deterministic policy gradient(DDPG) strategy based on deep reinforcement learning for the attack area fitting of UCAVs in the future battlefield. Simulation results show that the autonomy and environmental adaptability of UCAVs in the future battlefield will be improved based on the new DDPG algorithm and the training process converges quickly. We can obtain the optimal values of attack areas in real time during the whole flight with the well-trained deep network.展开更多
Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harm...Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harmful attacks. Routing randomization is a relevant research direction for moving target defense, which has been proven to be an effective method to resist eavesdropping attacks. To counter eavesdropping attacks, in this study, we analyzed the existing routing randomization methods and found that their security and usability need to be further improved. According to the characteristics of eavesdropping attacks, which are “latent and transferable”, a routing randomization defense method based on deep reinforcement learning is proposed. The proposed method realizes routing randomization on packet-level granularity using programmable switches. To improve the security and quality of service of legitimate services in networks, we use the deep deterministic policy gradient to generate random routing schemes with support from powerful network state awareness. In-band network telemetry provides real-time, accurate, and comprehensive network state awareness for the proposed method. Various experiments show that compared with other typical routing randomization defense methods, the proposed method has obvious advantages in security and usability against eavesdropping attacks.展开更多
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co...The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents.展开更多
Integrated energy system optimization scheduling can improve energy efficiency and low carbon economy.This paper studies an electric-gas-heat integrated energy system,including the carbon capture system,energy couplin...Integrated energy system optimization scheduling can improve energy efficiency and low carbon economy.This paper studies an electric-gas-heat integrated energy system,including the carbon capture system,energy coupling equipment,and renewable energy.An energy scheduling strategy based on deep reinforcement learning is proposed to minimize operation cost,carbon emission and enhance the power supply reliability.Firstly,the lowcarbon mathematical model of combined thermal and power unit,carbon capture system and power to gas unit(CCP)is established.Subsequently,we establish a low carbon multi-objective optimization model considering system operation cost,carbon emissions cost,integrated demand response,wind and photovoltaic curtailment,and load shedding costs.Furthermore,considering the intermittency of wind power generation and the flexibility of load demand,the low carbon economic dispatch problem is modeled as a Markov decision process.The twin delayed deep deterministic policy gradient(TD3)algorithm is used to solve the complex scheduling problem.The effectiveness of the proposed method is verified in the simulation case studies.Compared with TD3,SAC,A3C,DDPG and DQN algorithms,the operating cost is reduced by 8.6%,4.3%,6.1%and 8.0%.展开更多
Mobile Edge Computing(MEC)is a promising approach.Dynamic service migration is a key technology in MEC.In order to maintain the continuity of services in a dynamic environment,mobile users need to migrate tasks betwee...Mobile Edge Computing(MEC)is a promising approach.Dynamic service migration is a key technology in MEC.In order to maintain the continuity of services in a dynamic environment,mobile users need to migrate tasks between multiple servers in real time.Due to the uncertainty of movement,frequent migration will increase delays and costs and non-migration will lead to service interruption.Therefore,it is very challenging to design an optimal migration strategy.In this paper,we investigate the multi-user task migration problem in a dynamic environment and minimizes the average service delay while meeting the migration cost.In order to optimize the service delay and migration cost,we propose an adaptive weight deep deterministic policy gradient(AWDDPG)algorithm.And distributed execution and centralized training are adopted to solve the high-dimensional problem.Experiments show that the proposed algorithm can greatly reduce the migration cost and service delay compared with the other related algorithms.展开更多
While moving towards a low-carbon, sustainable electricity system, distribution networks are expected to host a large share of distributed generators, such as photovoltaic units and wind turbines. These inverter-based...While moving towards a low-carbon, sustainable electricity system, distribution networks are expected to host a large share of distributed generators, such as photovoltaic units and wind turbines. These inverter-based resources are intermittent, but also controllable, and are expected to amplify the role of distribution networks together with other distributed energy resources, such as storage systems and controllable loads. The available control methods for these resources are typically categorized based on the available communication network into centralized, distributed, and decentralized or local. Standard local schemes are typically inefficient, whereas centralized approaches show implementation and cost concerns. This paper focuses on optimized decentralized control of distributed generators via supervised and reinforcement learning. We present existing state-of-the-art decentralized control schemes based on supervised learning, propose a new reinforcement learning scheme based on deep deterministic policy gradient, and compare the behavior of both decentralized and centralized methods in terms of computational effort, scalability, privacy awareness, ability to consider constraints, and overall optimality. We evaluate the performance of the examined schemes on a benchmark European low voltage test system. The results show that both supervised learning and reinforcement learning schemes effectively mitigate the operational issues faced by the distribution network.展开更多
Unmanned aerial vehicle (UAV)-based edge computing is an emerging technology that provides fast task processing for a wider area. To address the issues of limited computation resource of a single UAV and finite commun...Unmanned aerial vehicle (UAV)-based edge computing is an emerging technology that provides fast task processing for a wider area. To address the issues of limited computation resource of a single UAV and finite communication resource in multi-UAV networks, this paper joints consideration of task offloading and wireless channel allocation on a collaborative multi-UAV computing network, where a high altitude platform station (HAPS)is adopted as the relay device for communication between UAV clusters consisting of UAV cluster heads (ch-UAVs) and mission UAVs (m-UAVs). We propose an algorithm, jointing task offloading and wireless channel allocation to maximize the average service success rate (ASSR)of a period time. In particular,the simulated annealing(SA)algorithm with random perturbations is used for optimal channel allocation,aiming to reduce interference and minimize transmission delay.A multi-agent deep deterministic policy gradient (MADDPG) is proposed to get the best task offloading strategy. Simulation results demonstrate the effectiveness of the SA algorithm in channel allocation. Meanwhile,when jointly considering computation and channel resources,the proposed scheme effectively enhances the ASSR in comparison to other benchmark algorithms.展开更多
A new online scheduling algorithm is proposed for photovoltaic(PV)systems with battery-assisted energy storage systems(BESS).The stochastic nature of renewable energy sources necessitates the employment of BESS to bal...A new online scheduling algorithm is proposed for photovoltaic(PV)systems with battery-assisted energy storage systems(BESS).The stochastic nature of renewable energy sources necessitates the employment of BESS to balance energy supplies and demands under uncertain weather conditions.The proposed online scheduling algorithm aims at minimizing the overall energy cost by performing actions such as load shifting and peak shaving through carefully scheduled BESS charging/discharging activities.The scheduling algorithm is developed by using deep deterministic policy gradient(DDPG),a deep reinforcement learning(DRL)algorithm that can deal with continuous state and action spaces.One of the main contributions of this work is a new DDPG reward function,which is designed based on the unique behaviors of energy systems.The new reward function can guide the scheduler to learn the appropriate behaviors of load shifting and peak shaving through a balanced process of exploration and exploitation.The new scheduling algorithm is tested through case studies using real world data,and the results indicate that it outperforms existing algorithms such as Deep Q-learning.The online algorithm can efficiently learn the behaviors of optimum non-casual off-line algorithms.展开更多
Purpose-English original movies played an important role in English learning and communication.In order to find the required movies for us from a large number of English original movies and reviews,this paper proposed...Purpose-English original movies played an important role in English learning and communication.In order to find the required movies for us from a large number of English original movies and reviews,this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies.In fact,although the conventional movies recommendation algorithms have solved the problem of information overload,they still have their limitations in the case of cold start-up and sparse data.Design/methodology/approach-To solve the aforementioned problems of conventional movies recommendation algorithms,this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning,which uses the deep deterministic policy gradient(DDPG)algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one.Meanwhile,a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.Findings-In order to verify the feasibility and validity of the proposed algorithm,the state of the art and the proposed algorithm are compared in indexes of RMSE,recall rate and accuracy based on the MovieLens English original movie data set for the experiments.Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.Originality/value-Applying the proposed algorithm to recommend English original movies,DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data.展开更多
Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different ...Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption.展开更多
This paper proposes an improved decision-making method based on deep reinforcement learning to address on-ramp merging challenges in highway autonomous driving.A novel safety indicator,time difference to merging(TDTM)...This paper proposes an improved decision-making method based on deep reinforcement learning to address on-ramp merging challenges in highway autonomous driving.A novel safety indicator,time difference to merging(TDTM),is introduced,which is used in conjunction with the classic time to collision(TTC)indicator to evaluate driving safety and assist the merging vehicle in finding a suitable gap in traffic,thereby enhancing driving safety.The training of an autonomous driving agent is performed using the Deep Deterministic Policy Gradient(DDPG)algorithm.An action-masking mechanism is deployed to prevent unsafe actions during the policy exploration phase.The proposed DDPG+TDTM+TTC solution is tested in on-ramp merging scenarios with different driving speeds in SUMO and achieves a success rate of 99.96%without significantly impacting traffic efficiency on the main road.The results demonstrate that DDPG+TDTM+TTC achieved a higher on-ramp merging success rate of 99.96%compared to DDPG+TTC and DDPG.展开更多
In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue...In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue that the missile cannot accurately distinguish the target from the decoy,the energy center method is employed to obtain the equivalent energy center(called virtual target)of the target and decoy,and the model for the missile and the virtual decoy is established.Then,an improved DDPG algorithm is proposed based on a trusted-search strategy,which significantly increases the train efficiency of the previous DDPG algorithm.Furthermore,combining the established model,the network obtained by the improved DDPG algorithm and the reward function,an intelligent missile terminal guidance scheme is proposed.Specifically,a heuristic reward function is designed for training and learning in combat scenarios.Finally,the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests,and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.展开更多
Modeling a system in engineering applications is a time-consuming and labor-intensive task,as system parameters may change with temperature,component aging,etc.In this paper,a novel data-driven model-free optimal cont...Modeling a system in engineering applications is a time-consuming and labor-intensive task,as system parameters may change with temperature,component aging,etc.In this paper,a novel data-driven model-free optimal controller based on deep deterministic policy gradient(DDPG)is proposed to address the problem of continuous-time leader-following multi-agent consensus.To deal with the problem of the dimensional explosion of state space and action space,two different types of neural nets are utilized to fit them instead of the time-consuming state iteration process.With minimal energy consumption,the proposed controller achieves consensus only based on the consensus error and does not require any initial admissible policies.Besides,the controller is self-learning,which means it can achieve optimal control by learning in real time as the system parameters change.Finally,the proofs of convergence and stability,as well as some simulation experiments,are provided to verify the algorithm’s effectiveness.展开更多
This paper proposes a robust and computationally efficient control method for damping ultra-low frequency oscillations(ULFOs) in hydropower-dominated systems. Unlike the existing robust optimization based control form...This paper proposes a robust and computationally efficient control method for damping ultra-low frequency oscillations(ULFOs) in hydropower-dominated systems. Unlike the existing robust optimization based control formulation that can only deal with a limited number of operating conditions, the proposed method reformulates the control problem into a bi-level robust parameter optimization model. This allows us to consider a wide range of system operating conditions. To speed up the bi-level optimization process, the deep deterministic policy gradient(DDPG) based deep reinforcement learning algorithm is developed to train an intelligent agent. This agent can provide very fast lower-level decision variables for the upper-level model, significantly enhancing its computational efficiency. Simulation results demonstrate that the proposed method can achieve much better damping control performance than other alternatives with slightly degraded dynamic response performance of the governor under various types of operating conditions.展开更多
基金supported in part by the projects of the National Natural Science Foundation of China(62376059,41971340)Fujian Provincial Department of Science and Technology(2023XQ008,2023I0024,2021Y4019),Fujian Provincial Department of Finance(GY-Z230007,GYZ23012)Fujian Key Laboratory of Automotive Electronics and Electric Drive(KF-19-22001).
文摘Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data.
文摘Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO.
基金supported in part by the Hubei Provincial Science and Technology Major Project of China(Grant No.2020AEA011)in part by the National Ethnic Affairs Commission of the People’s Republic of China(Training Program for Young and Middle-aged Talents)(No:MZR20007)+4 种基金in part by the National Natural Science Foundation of China(Grant No.61902437)in part by the Hubei Provincial Natural Science Foundation of China(Grant No.2020CFB629)in part by the Application Foundation Frontier Project of Wuhan Science and Technology Program(Grant No.2020020601012267)in part by the Fundamental Research Funds for the Central Universities,South-Central MinZu University(No:CZQ21026)in part by the Special Project on Regional Collaborative Innovation of Xinjiang Uygur Autonomous Region(Plan to Aid Xinjiang with Science and Technology)(2022E02035)。
文摘The path planning of Unmanned Aerial Vehicle(UAV)is a critical issue in emergency communication and rescue operations,especially in adversarial urban environments.Due to the continuity of the flying space,complex building obstacles,and the aircraft's high dynamics,traditional algorithms cannot find the optimal collision-free flying path between the UAV station and the destination.Accordingly,in this paper,we study the fast UAV path planning problem in a 3D urban environment from a source point to a target point and propose a Three-Step Experience Buffer Deep Deterministic Policy Gradient(TSEB-DDPG)algorithm.We first build the 3D model of a complex urban environment with buildings and project the 3D building surface into many 2D geometric shapes.After transformation,we propose the Hierarchical Learning Particle Swarm Optimization(HL-PSO)to obtain the empirical path.Then,to ensure the accuracy of the obtained paths,the empirical path,the collision information and fast transition information are stored in the three experience buffers of the TSEB-DDPG algorithm as dynamic guidance information.The sampling ratio of each buffer is dynamically adapted to the training stages.Moreover,we designed a reward mechanism to improve the convergence speed of the DDPG algorithm for UAV path planning.The proposed TSEB-DDPG algorithm has also been compared to three widely used competitors experimentally,and the results show that the TSEB-DDPG algorithm can archive the fastest convergence speed and the highest accuracy.We also conduct experiments in real scenarios and compare the real path planning obtained by the HL-PSO algorithm,DDPG algorithm,and TSEB-DDPG algorithm.The results show that the TSEBDDPG algorithm can archive almost the best in terms of accuracy,the average time of actual path planning,and the success rate.
基金supported by the Key Research and Development Program of Shaanxi(2022GY-089)the Natural Science Basic Research Program of Shaanxi(2022JQ-593).
文摘The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.
基金supported by the National Natural Science Foundation of China under Grant Nos.62201462 and 62271412.
文摘Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interference caused by the line-of-sight(LoS)airto-ground channels,we deploy a reconfigurable intelligent surface(RIS)to rebuild the wireless channels.A joint optimization problem of the transmit power of UAV,the transmit power of D2D users and the RIS phase configuration are investigated to maximize the achievable rate of D2D users while satisfying the quality of service(QoS)requirement of cellular users.Due to the high channel dynamics and the coupling among cellular users,the RIS,and the D2D users,it is challenging to find a proper solution.Thus,a RIS softmax deep double deterministic(RIS-SD3)policy gradient method is proposed,which can smooth the optimization space as well as reduce the number of local optimizations.Specifically,the SD3 algorithm maximizes the reward of the agent by training the agent to maximize the value function after the softmax operator is introduced.Simulation results show that the proposed RIS-SD3 algorithm can significantly improve the rate of the D2D users while controlling the interference to the cellular user.Moreover,the proposed RIS-SD3 algorithm has better robustness than the twin delayed deep deterministic(TD3)policy gradient algorithm in a dynamic environment.
基金supported by the Key Laboratory of Defense Science and Technology Foundation of Luoyang Electro-optical Equipment Research Institute(6142504200108)。
文摘The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies while increasing the tactical capabilities of combat aircraft. As a research object, common UCAV uses the neural network fitting strategy to obtain values of attack areas. However, this simple strategy cannot cope with complex environmental changes and autonomously optimize decision-making problems. To solve the problem, this paper proposes a new deep deterministic policy gradient(DDPG) strategy based on deep reinforcement learning for the attack area fitting of UCAVs in the future battlefield. Simulation results show that the autonomy and environmental adaptability of UCAVs in the future battlefield will be improved based on the new DDPG algorithm and the training process converges quickly. We can obtain the optimal values of attack areas in real time during the whole flight with the well-trained deep network.
文摘Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harmful attacks. Routing randomization is a relevant research direction for moving target defense, which has been proven to be an effective method to resist eavesdropping attacks. To counter eavesdropping attacks, in this study, we analyzed the existing routing randomization methods and found that their security and usability need to be further improved. According to the characteristics of eavesdropping attacks, which are “latent and transferable”, a routing randomization defense method based on deep reinforcement learning is proposed. The proposed method realizes routing randomization on packet-level granularity using programmable switches. To improve the security and quality of service of legitimate services in networks, we use the deep deterministic policy gradient to generate random routing schemes with support from powerful network state awareness. In-band network telemetry provides real-time, accurate, and comprehensive network state awareness for the proposed method. Various experiments show that compared with other typical routing randomization defense methods, the proposed method has obvious advantages in security and usability against eavesdropping attacks.
基金supported by The National Key R&D Program of China(2020YFB0905900):Research on artificial intelligence application of power internet of things.
文摘The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents.
基金supported in part by the Scientific Research Fund of Liaoning Provincial Education Department under Grant LQGD2019005in part by the Doctoral Start-up Foundation of Liaoning Province under Grant 2020-BS-141.
文摘Integrated energy system optimization scheduling can improve energy efficiency and low carbon economy.This paper studies an electric-gas-heat integrated energy system,including the carbon capture system,energy coupling equipment,and renewable energy.An energy scheduling strategy based on deep reinforcement learning is proposed to minimize operation cost,carbon emission and enhance the power supply reliability.Firstly,the lowcarbon mathematical model of combined thermal and power unit,carbon capture system and power to gas unit(CCP)is established.Subsequently,we establish a low carbon multi-objective optimization model considering system operation cost,carbon emissions cost,integrated demand response,wind and photovoltaic curtailment,and load shedding costs.Furthermore,considering the intermittency of wind power generation and the flexibility of load demand,the low carbon economic dispatch problem is modeled as a Markov decision process.The twin delayed deep deterministic policy gradient(TD3)algorithm is used to solve the complex scheduling problem.The effectiveness of the proposed method is verified in the simulation case studies.Compared with TD3,SAC,A3C,DDPG and DQN algorithms,the operating cost is reduced by 8.6%,4.3%,6.1%and 8.0%.
基金Basic Science(Natural Science)Research Project of Colleges and universities in Jiangsu Province(22KJB520017).
文摘Mobile Edge Computing(MEC)is a promising approach.Dynamic service migration is a key technology in MEC.In order to maintain the continuity of services in a dynamic environment,mobile users need to migrate tasks between multiple servers in real time.Due to the uncertainty of movement,frequent migration will increase delays and costs and non-migration will lead to service interruption.Therefore,it is very challenging to design an optimal migration strategy.In this paper,we investigate the multi-user task migration problem in a dynamic environment and minimizes the average service delay while meeting the migration cost.In order to optimize the service delay and migration cost,we propose an adaptive weight deep deterministic policy gradient(AWDDPG)algorithm.And distributed execution and centralized training are adopted to solve the high-dimensional problem.Experiments show that the proposed algorithm can greatly reduce the migration cost and service delay compared with the other related algorithms.
文摘While moving towards a low-carbon, sustainable electricity system, distribution networks are expected to host a large share of distributed generators, such as photovoltaic units and wind turbines. These inverter-based resources are intermittent, but also controllable, and are expected to amplify the role of distribution networks together with other distributed energy resources, such as storage systems and controllable loads. The available control methods for these resources are typically categorized based on the available communication network into centralized, distributed, and decentralized or local. Standard local schemes are typically inefficient, whereas centralized approaches show implementation and cost concerns. This paper focuses on optimized decentralized control of distributed generators via supervised and reinforcement learning. We present existing state-of-the-art decentralized control schemes based on supervised learning, propose a new reinforcement learning scheme based on deep deterministic policy gradient, and compare the behavior of both decentralized and centralized methods in terms of computational effort, scalability, privacy awareness, ability to consider constraints, and overall optimality. We evaluate the performance of the examined schemes on a benchmark European low voltage test system. The results show that both supervised learning and reinforcement learning schemes effectively mitigate the operational issues faced by the distribution network.
基金supported in part by the National Natural Science Foundation of China under Grants 62341104,62201085,62325108,and 62341131.
文摘Unmanned aerial vehicle (UAV)-based edge computing is an emerging technology that provides fast task processing for a wider area. To address the issues of limited computation resource of a single UAV and finite communication resource in multi-UAV networks, this paper joints consideration of task offloading and wireless channel allocation on a collaborative multi-UAV computing network, where a high altitude platform station (HAPS)is adopted as the relay device for communication between UAV clusters consisting of UAV cluster heads (ch-UAVs) and mission UAVs (m-UAVs). We propose an algorithm, jointing task offloading and wireless channel allocation to maximize the average service success rate (ASSR)of a period time. In particular,the simulated annealing(SA)algorithm with random perturbations is used for optimal channel allocation,aiming to reduce interference and minimize transmission delay.A multi-agent deep deterministic policy gradient (MADDPG) is proposed to get the best task offloading strategy. Simulation results demonstrate the effectiveness of the SA algorithm in channel allocation. Meanwhile,when jointly considering computation and channel resources,the proposed scheme effectively enhances the ASSR in comparison to other benchmark algorithms.
基金supported in part by the U.S National Science Foundation(NSF)(No.ECCS-1711087)NSF Center for Infrastructure Trustworthiness in Energy Systems(CITES).
文摘A new online scheduling algorithm is proposed for photovoltaic(PV)systems with battery-assisted energy storage systems(BESS).The stochastic nature of renewable energy sources necessitates the employment of BESS to balance energy supplies and demands under uncertain weather conditions.The proposed online scheduling algorithm aims at minimizing the overall energy cost by performing actions such as load shifting and peak shaving through carefully scheduled BESS charging/discharging activities.The scheduling algorithm is developed by using deep deterministic policy gradient(DDPG),a deep reinforcement learning(DRL)algorithm that can deal with continuous state and action spaces.One of the main contributions of this work is a new DDPG reward function,which is designed based on the unique behaviors of energy systems.The new reward function can guide the scheduler to learn the appropriate behaviors of load shifting and peak shaving through a balanced process of exploration and exploitation.The new scheduling algorithm is tested through case studies using real world data,and the results indicate that it outperforms existing algorithms such as Deep Q-learning.The online algorithm can efficiently learn the behaviors of optimum non-casual off-line algorithms.
基金supported by the education and research project of young and middle-aged teachers in Fujian province(special research project of foreign language teaching reform in colleges and universities):No.JZ170067.
文摘Purpose-English original movies played an important role in English learning and communication.In order to find the required movies for us from a large number of English original movies and reviews,this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies.In fact,although the conventional movies recommendation algorithms have solved the problem of information overload,they still have their limitations in the case of cold start-up and sparse data.Design/methodology/approach-To solve the aforementioned problems of conventional movies recommendation algorithms,this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning,which uses the deep deterministic policy gradient(DDPG)algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one.Meanwhile,a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.Findings-In order to verify the feasibility and validity of the proposed algorithm,the state of the art and the proposed algorithm are compared in indexes of RMSE,recall rate and accuracy based on the MovieLens English original movie data set for the experiments.Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.Originality/value-Applying the proposed algorithm to recommend English original movies,DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data.
文摘Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption.
基金supported by the National Natural Science Foundation of China(Grant No.52272421)the Shenzhen Fundamental Research Fund(Grant No.JCYJ20190808142613246).
文摘This paper proposes an improved decision-making method based on deep reinforcement learning to address on-ramp merging challenges in highway autonomous driving.A novel safety indicator,time difference to merging(TDTM),is introduced,which is used in conjunction with the classic time to collision(TTC)indicator to evaluate driving safety and assist the merging vehicle in finding a suitable gap in traffic,thereby enhancing driving safety.The training of an autonomous driving agent is performed using the Deep Deterministic Policy Gradient(DDPG)algorithm.An action-masking mechanism is deployed to prevent unsafe actions during the policy exploration phase.The proposed DDPG+TDTM+TTC solution is tested in on-ramp merging scenarios with different driving speeds in SUMO and achieves a success rate of 99.96%without significantly impacting traffic efficiency on the main road.The results demonstrate that DDPG+TDTM+TTC achieved a higher on-ramp merging success rate of 99.96%compared to DDPG+TTC and DDPG.
基金supported by the National Natural Science Foundation of China(Nos.61973253 and 62006192)。
文摘In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue that the missile cannot accurately distinguish the target from the decoy,the energy center method is employed to obtain the equivalent energy center(called virtual target)of the target and decoy,and the model for the missile and the virtual decoy is established.Then,an improved DDPG algorithm is proposed based on a trusted-search strategy,which significantly increases the train efficiency of the previous DDPG algorithm.Furthermore,combining the established model,the network obtained by the improved DDPG algorithm and the reward function,an intelligent missile terminal guidance scheme is proposed.Specifically,a heuristic reward function is designed for training and learning in combat scenarios.Finally,the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests,and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.
基金supported by the Tianjin Natural Science Foundation of China(Grant No.20JCYBJC01060)the National Natural Science Foundation of China(Grant Nos.62103203 and 61973175)the Fundamental Research Funds for the Central Universities,Nankai University(Grant No.63221218)。
文摘Modeling a system in engineering applications is a time-consuming and labor-intensive task,as system parameters may change with temperature,component aging,etc.In this paper,a novel data-driven model-free optimal controller based on deep deterministic policy gradient(DDPG)is proposed to address the problem of continuous-time leader-following multi-agent consensus.To deal with the problem of the dimensional explosion of state space and action space,two different types of neural nets are utilized to fit them instead of the time-consuming state iteration process.With minimal energy consumption,the proposed controller achieves consensus only based on the consensus error and does not require any initial admissible policies.Besides,the controller is self-learning,which means it can achieve optimal control by learning in real time as the system parameters change.Finally,the proofs of convergence and stability,as well as some simulation experiments,are provided to verify the algorithm’s effectiveness.
基金supported by the National Natural Science Foundation of China (No.52277083)。
文摘This paper proposes a robust and computationally efficient control method for damping ultra-low frequency oscillations(ULFOs) in hydropower-dominated systems. Unlike the existing robust optimization based control formulation that can only deal with a limited number of operating conditions, the proposed method reformulates the control problem into a bi-level robust parameter optimization model. This allows us to consider a wide range of system operating conditions. To speed up the bi-level optimization process, the deep deterministic policy gradient(DDPG) based deep reinforcement learning algorithm is developed to train an intelligent agent. This agent can provide very fast lower-level decision variables for the upper-level model, significantly enhancing its computational efficiency. Simulation results demonstrate that the proposed method can achieve much better damping control performance than other alternatives with slightly degraded dynamic response performance of the governor under various types of operating conditions.