Dear Editor,In this letter,the multi-objective optimal control problem of nonlinear discrete-time systems is investigated.A data-driven policy gradient algorithm is proposed in which the action-state value function is...Dear Editor,In this letter,the multi-objective optimal control problem of nonlinear discrete-time systems is investigated.A data-driven policy gradient algorithm is proposed in which the action-state value function is used to evaluate the policy.In the policy improvement process,the policy gradient based method is employed.展开更多
Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonom...Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data.展开更多
Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains ...Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO.展开更多
针对智能反射面(IRS, intelligent reflecting surface)辅助的多输入单输出(MISO, multiple input singleoutput)无线携能通信(SWIPT, simultaneous wireless information and power transfer)系统,考虑基站最大发射功率、IRS反射相移...针对智能反射面(IRS, intelligent reflecting surface)辅助的多输入单输出(MISO, multiple input singleoutput)无线携能通信(SWIPT, simultaneous wireless information and power transfer)系统,考虑基站最大发射功率、IRS反射相移矩阵的单位膜约束和能量接收器的最小能量约束,以最大化信息传输速率为目标,联合优化了基站处的波束成形向量和智能反射面的反射波束成形向量。为解决非凸优化问题,提出了一种基于深度强化学习的深度确定性策略梯度(DDPG, deep deterministic policy gradient)算法。仿真结果表明,DDPG算法的平均奖励与学习率有关,在选取合适的学习率的条件下,DDPG算法能获得与传统优化算法相近的平均互信息,但运行时间明显低于传统的非凸优化算法,即使增加天线数和反射单元数,DDPG算法依然可以在较短的时间内收敛。这说明DDPG算法能有效地提高计算效率,更适合实时性要求较高的通信业务。展开更多
基金the National Natural Science Foundation of China(61922063,62273255,62150026)in part by the Shanghai International Science and Technology Cooperation Project(21550760900,22510712000)+1 种基金the Shanghai Municipal Science and Technology Major Project(2021SHZDZX0100)the Fundamental Research Funds for the Central Universities。
文摘Dear Editor,In this letter,the multi-objective optimal control problem of nonlinear discrete-time systems is investigated.A data-driven policy gradient algorithm is proposed in which the action-state value function is used to evaluate the policy.In the policy improvement process,the policy gradient based method is employed.
基金supported in part by the projects of the National Natural Science Foundation of China(62376059,41971340)Fujian Provincial Department of Science and Technology(2023XQ008,2023I0024,2021Y4019),Fujian Provincial Department of Finance(GY-Z230007,GYZ23012)Fujian Key Laboratory of Automotive Electronics and Electric Drive(KF-19-22001).
文摘Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data.
文摘Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO.
文摘针对智能反射面(IRS, intelligent reflecting surface)辅助的多输入单输出(MISO, multiple input singleoutput)无线携能通信(SWIPT, simultaneous wireless information and power transfer)系统,考虑基站最大发射功率、IRS反射相移矩阵的单位膜约束和能量接收器的最小能量约束,以最大化信息传输速率为目标,联合优化了基站处的波束成形向量和智能反射面的反射波束成形向量。为解决非凸优化问题,提出了一种基于深度强化学习的深度确定性策略梯度(DDPG, deep deterministic policy gradient)算法。仿真结果表明,DDPG算法的平均奖励与学习率有关,在选取合适的学习率的条件下,DDPG算法能获得与传统优化算法相近的平均互信息,但运行时间明显低于传统的非凸优化算法,即使增加天线数和反射单元数,DDPG算法依然可以在较短的时间内收敛。这说明DDPG算法能有效地提高计算效率,更适合实时性要求较高的通信业务。