Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonom...Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data.展开更多
Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains ...Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO.展开更多
针对智能反射面(IRS, intelligent reflecting surface)辅助的多输入单输出(MISO, multiple input singleoutput)无线携能通信(SWIPT, simultaneous wireless information and power transfer)系统,考虑基站最大发射功率、IRS反射相移...针对智能反射面(IRS, intelligent reflecting surface)辅助的多输入单输出(MISO, multiple input singleoutput)无线携能通信(SWIPT, simultaneous wireless information and power transfer)系统,考虑基站最大发射功率、IRS反射相移矩阵的单位膜约束和能量接收器的最小能量约束,以最大化信息传输速率为目标,联合优化了基站处的波束成形向量和智能反射面的反射波束成形向量。为解决非凸优化问题,提出了一种基于深度强化学习的深度确定性策略梯度(DDPG, deep deterministic policy gradient)算法。仿真结果表明,DDPG算法的平均奖励与学习率有关,在选取合适的学习率的条件下,DDPG算法能获得与传统优化算法相近的平均互信息,但运行时间明显低于传统的非凸优化算法,即使增加天线数和反射单元数,DDPG算法依然可以在较短的时间内收敛。这说明DDPG算法能有效地提高计算效率,更适合实时性要求较高的通信业务。展开更多
为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG:Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)算法。该算法在训练阶段,通过大量试验训练出最佳策略...为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG:Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient)算法。该算法在训练阶段,通过大量试验训练出最佳策略,并在实验阶段直接使用训练出的最佳策略得到最佳编队路径。仿真实验将4艘相同的“百川号”无人船作为实验对象。实验结果表明,基于ATMADDPG算法的队形保持策略能实现稳定的多无人船编队导航,并在一定程度上满足队形保持的要求。相较于多智能体深度确定性策略梯度(MADDPG:Multi-Agent Depth Deterministic Policy Gradient)算法,所提出的ATMADDPG算法在收敛速度、队形保持能力和对环境变化的适应性等方面表现出更优越的性能,综合导航效率可提高约80%,具有较大的应用潜力。展开更多
The path planning of Unmanned Aerial Vehicle(UAV)is a critical issue in emergency communication and rescue operations,especially in adversarial urban environments.Due to the continuity of the flying space,complex buil...The path planning of Unmanned Aerial Vehicle(UAV)is a critical issue in emergency communication and rescue operations,especially in adversarial urban environments.Due to the continuity of the flying space,complex building obstacles,and the aircraft's high dynamics,traditional algorithms cannot find the optimal collision-free flying path between the UAV station and the destination.Accordingly,in this paper,we study the fast UAV path planning problem in a 3D urban environment from a source point to a target point and propose a Three-Step Experience Buffer Deep Deterministic Policy Gradient(TSEB-DDPG)algorithm.We first build the 3D model of a complex urban environment with buildings and project the 3D building surface into many 2D geometric shapes.After transformation,we propose the Hierarchical Learning Particle Swarm Optimization(HL-PSO)to obtain the empirical path.Then,to ensure the accuracy of the obtained paths,the empirical path,the collision information and fast transition information are stored in the three experience buffers of the TSEB-DDPG algorithm as dynamic guidance information.The sampling ratio of each buffer is dynamically adapted to the training stages.Moreover,we designed a reward mechanism to improve the convergence speed of the DDPG algorithm for UAV path planning.The proposed TSEB-DDPG algorithm has also been compared to three widely used competitors experimentally,and the results show that the TSEB-DDPG algorithm can archive the fastest convergence speed and the highest accuracy.We also conduct experiments in real scenarios and compare the real path planning obtained by the HL-PSO algorithm,DDPG algorithm,and TSEB-DDPG algorithm.The results show that the TSEBDDPG algorithm can archive almost the best in terms of accuracy,the average time of actual path planning,and the success rate.展开更多
基金supported in part by the projects of the National Natural Science Foundation of China(62376059,41971340)Fujian Provincial Department of Science and Technology(2023XQ008,2023I0024,2021Y4019),Fujian Provincial Department of Finance(GY-Z230007,GYZ23012)Fujian Key Laboratory of Automotive Electronics and Electric Drive(KF-19-22001).
文摘Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data.
文摘Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO.
文摘针对智能反射面(IRS, intelligent reflecting surface)辅助的多输入单输出(MISO, multiple input singleoutput)无线携能通信(SWIPT, simultaneous wireless information and power transfer)系统,考虑基站最大发射功率、IRS反射相移矩阵的单位膜约束和能量接收器的最小能量约束,以最大化信息传输速率为目标,联合优化了基站处的波束成形向量和智能反射面的反射波束成形向量。为解决非凸优化问题,提出了一种基于深度强化学习的深度确定性策略梯度(DDPG, deep deterministic policy gradient)算法。仿真结果表明,DDPG算法的平均奖励与学习率有关,在选取合适的学习率的条件下,DDPG算法能获得与传统优化算法相近的平均互信息,但运行时间明显低于传统的非凸优化算法,即使增加天线数和反射单元数,DDPG算法依然可以在较短的时间内收敛。这说明DDPG算法能有效地提高计算效率,更适合实时性要求较高的通信业务。
基金supported in part by the Hubei Provincial Science and Technology Major Project of China(Grant No.2020AEA011)in part by the National Ethnic Affairs Commission of the People’s Republic of China(Training Program for Young and Middle-aged Talents)(No:MZR20007)+4 种基金in part by the National Natural Science Foundation of China(Grant No.61902437)in part by the Hubei Provincial Natural Science Foundation of China(Grant No.2020CFB629)in part by the Application Foundation Frontier Project of Wuhan Science and Technology Program(Grant No.2020020601012267)in part by the Fundamental Research Funds for the Central Universities,South-Central MinZu University(No:CZQ21026)in part by the Special Project on Regional Collaborative Innovation of Xinjiang Uygur Autonomous Region(Plan to Aid Xinjiang with Science and Technology)(2022E02035)。
文摘The path planning of Unmanned Aerial Vehicle(UAV)is a critical issue in emergency communication and rescue operations,especially in adversarial urban environments.Due to the continuity of the flying space,complex building obstacles,and the aircraft's high dynamics,traditional algorithms cannot find the optimal collision-free flying path between the UAV station and the destination.Accordingly,in this paper,we study the fast UAV path planning problem in a 3D urban environment from a source point to a target point and propose a Three-Step Experience Buffer Deep Deterministic Policy Gradient(TSEB-DDPG)algorithm.We first build the 3D model of a complex urban environment with buildings and project the 3D building surface into many 2D geometric shapes.After transformation,we propose the Hierarchical Learning Particle Swarm Optimization(HL-PSO)to obtain the empirical path.Then,to ensure the accuracy of the obtained paths,the empirical path,the collision information and fast transition information are stored in the three experience buffers of the TSEB-DDPG algorithm as dynamic guidance information.The sampling ratio of each buffer is dynamically adapted to the training stages.Moreover,we designed a reward mechanism to improve the convergence speed of the DDPG algorithm for UAV path planning.The proposed TSEB-DDPG algorithm has also been compared to three widely used competitors experimentally,and the results show that the TSEB-DDPG algorithm can archive the fastest convergence speed and the highest accuracy.We also conduct experiments in real scenarios and compare the real path planning obtained by the HL-PSO algorithm,DDPG algorithm,and TSEB-DDPG algorithm.The results show that the TSEBDDPG algorithm can archive almost the best in terms of accuracy,the average time of actual path planning,and the success rate.