期刊文献+
共找到298篇文章
< 1 2 15 >
每页显示 20 50 100
Perception Enhanced Deep Deterministic Policy Gradient for Autonomous Driving in Complex Scenarios
1
作者 Lyuchao Liao Hankun Xiao +3 位作者 Pengqi Xing Zhenhua Gan Youpeng He Jiajun Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期557-576,共20页
Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonom... Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data. 展开更多
关键词 Autonomous driving traffic roundabouts deep deterministic policy gradient spatial attention mechanisms
下载PDF
Optimizing the Multi-Objective Discrete Particle Swarm Optimization Algorithm by Deep Deterministic Policy Gradient Algorithm
2
作者 Sun Yang-Yang Yao Jun-Ping +2 位作者 Li Xiao-Jun Fan Shou-Xiang Wang Zi-Wei 《Journal on Artificial Intelligence》 2022年第1期27-35,共9页
Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains ... Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO. 展开更多
关键词 deep deterministic policy gradient multi-objective discrete particle swarm optimization deep reinforcement learning machine learning
下载PDF
Real-Time Implementation of Quadrotor UAV Control System Based on a Deep Reinforcement Learning Approach
3
作者 Taha Yacine Trad Kheireddine Choutri +4 位作者 Mohand Lagha Souham Meshoul Fouad Khenfri Raouf Fareh Hadil Shaiba 《Computers, Materials & Continua》 SCIE EI 2024年第12期4757-4786,共30页
The popularity of quadrotor Unmanned Aerial Vehicles(UAVs)stems from their simple propulsion systems and structural design.However,their complex and nonlinear dynamic behavior presents a significant challenge for cont... The popularity of quadrotor Unmanned Aerial Vehicles(UAVs)stems from their simple propulsion systems and structural design.However,their complex and nonlinear dynamic behavior presents a significant challenge for control,necessitating sophisticated algorithms to ensure stability and accuracy in flight.Various strategies have been explored by researchers and control engineers,with learning-based methods like reinforcement learning,deep learning,and neural networks showing promise in enhancing the robustness and adaptability of quadrotor control systems.This paper investigates a Reinforcement Learning(RL)approach for both high and low-level quadrotor control systems,focusing on attitude stabilization and position tracking tasks.A novel reward function and actor-critic network structures are designed to stimulate high-order observable states,improving the agent’s understanding of the quadrotor’s dynamics and environmental constraints.To address the challenge of RL hyper-parameter tuning,a new framework is introduced that combines Simulated Annealing(SA)with a reinforcement learning algorithm,specifically Simulated Annealing-Twin Delayed Deep Deterministic Policy Gradient(SA-TD3).This approach is evaluated for path-following and stabilization tasks through comparative assessments with two commonly used control methods:Backstepping and Sliding Mode Control(SMC).While the implementation of the well-trained agents exhibited unexpected behavior during real-world testing,a reduced neural network used for altitude control was successfully implemented on a Parrot Mambo mini drone.The results showcase the potential of the proposed SA-TD3 framework for real-world applications,demonstrating improved stability and precision across various test scenarios and highlighting its feasibility for practical deployment. 展开更多
关键词 deep reinforcement learning hyper-parameters optimization path following QUADROTOR twin delayed deep deterministic policy gradient and simulated annealing
下载PDF
Enhanced Deep Reinforcement Learning Strategy for Energy Management in Plug-in Hybrid Electric Vehicles with Entropy Regularization and Prioritized Experience Replay
4
作者 Li Wang Xiaoyong Wang 《Energy Engineering》 EI 2024年第12期3953-3979,共27页
Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different ... Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption. 展开更多
关键词 Plug-in hybrid electric vehicles deep reinforcement learning energy management strategy deep deterministic policy gradient entropy regularization prioritized experience replay
下载PDF
面向无人机协同定位的机载深度计算编译优化
5
作者 熊康 刘思聪 +3 位作者 王宏涛 高元 郭斌 於志文 《计算机科学与探索》 北大核心 2025年第1期141-157,共17页
随着无人机技术快速发展,在定位信号缺失的情况下进行无人机定位成为一个研究难题。而近几年图神经网络的出现与发展,为解决这一难题提供了一种新的解决思路。然而在资源受限的无人机端侧部署图神经网络面临着无人机算储资源受限及实时... 随着无人机技术快速发展,在定位信号缺失的情况下进行无人机定位成为一个研究难题。而近几年图神经网络的出现与发展,为解决这一难题提供了一种新的解决思路。然而在资源受限的无人机端侧部署图神经网络面临着无人机算储资源受限及实时性难以满足等挑战。提出面向无人机协同定位的机载深度计算编译优化方法。采用了一种轻量化的时间图卷积神经网络模型,该时间图卷积网络由图卷积网络和门控递归单元组成,将无人机群的空间依赖性和无人机位置变化的时间依赖性同时加以考虑,对无人机群位置进行精确的预测;针对该模型在时间图卷积网络上的冗余特性,提出了基于逆向Cuthill-McKee图重排和基于双深度确定性策略梯度的全局自适应剪枝算法。在保证无人机群坐标精确预测的同时,不仅能提高数据在主存的空间局部性,加速模型的运算速度,而且能够对模型进行自适应的非结构化剪枝,降低模型的存储复杂度。实验结果表明,相对于已有的时间图卷积神经网络模型,编译优化后的轻量化时间图卷积神经网络模型在保留78.8%准确率的同时,模型计算时间降低37.9%,模型的平均剪枝率达到90.3%。 展开更多
关键词 时间图卷积网络 协同定位 通道剪枝 图重排算法 深度确定性策略梯度
下载PDF
Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs 被引量:12
6
作者 LI Yue QIU Xiaohui +1 位作者 LIU Xiaodong XIA Qunli 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第4期734-742,共9页
The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies wh... The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies while increasing the tactical capabilities of combat aircraft. As a research object, common UCAV uses the neural network fitting strategy to obtain values of attack areas. However, this simple strategy cannot cope with complex environmental changes and autonomously optimize decision-making problems. To solve the problem, this paper proposes a new deep deterministic policy gradient(DDPG) strategy based on deep reinforcement learning for the attack area fitting of UCAVs in the future battlefield. Simulation results show that the autonomy and environmental adaptability of UCAVs in the future battlefield will be improved based on the new DDPG algorithm and the training process converges quickly. We can obtain the optimal values of attack areas in real time during the whole flight with the well-trained deep network. 展开更多
关键词 attack area neural network deep deterministic policy gradient(DDPG) unmanned combat aerial vehicle(UCAV)
下载PDF
Moving target defense of routing randomization with deep reinforcement learning against eavesdropping attack 被引量:4
7
作者 Xiaoyu Xu Hao Hu +3 位作者 Yuling Liu Jinglei Tan Hongqi Zhang Haotian Song 《Digital Communications and Networks》 SCIE CSCD 2022年第3期373-387,共15页
Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harm... Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harmful attacks. Routing randomization is a relevant research direction for moving target defense, which has been proven to be an effective method to resist eavesdropping attacks. To counter eavesdropping attacks, in this study, we analyzed the existing routing randomization methods and found that their security and usability need to be further improved. According to the characteristics of eavesdropping attacks, which are “latent and transferable”, a routing randomization defense method based on deep reinforcement learning is proposed. The proposed method realizes routing randomization on packet-level granularity using programmable switches. To improve the security and quality of service of legitimate services in networks, we use the deep deterministic policy gradient to generate random routing schemes with support from powerful network state awareness. In-band network telemetry provides real-time, accurate, and comprehensive network state awareness for the proposed method. Various experiments show that compared with other typical routing randomization defense methods, the proposed method has obvious advantages in security and usability against eavesdropping attacks. 展开更多
关键词 Routing randomization Moving target defense deep reinforcement learning deep deterministic policy gradient
下载PDF
Distributed optimization of electricity-Gas-Heat integrated energy system with multi-agent deep reinforcement learning 被引量:4
8
作者 Lei Dong Jing Wei +1 位作者 Hao Lin Xinying Wang 《Global Energy Interconnection》 EI CAS CSCD 2022年第6期604-617,共14页
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co... The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents. 展开更多
关键词 Integrated energy system Multi-agent system Distributed optimization Multi-agent deep deterministic policy gradient Real-time optimization decision
下载PDF
RIS-Assisted UAV-D2D Communications Exploiting Deep Reinforcement Learning
9
作者 YOU Qian XU Qian +2 位作者 YANG Xin ZHANG Tao CHEN Ming 《ZTE Communications》 2023年第2期61-69,共9页
Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interferenc... Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interference caused by the line-of-sight(LoS)airto-ground channels,we deploy a reconfigurable intelligent surface(RIS)to rebuild the wireless channels.A joint optimization problem of the transmit power of UAV,the transmit power of D2D users and the RIS phase configuration are investigated to maximize the achievable rate of D2D users while satisfying the quality of service(QoS)requirement of cellular users.Due to the high channel dynamics and the coupling among cellular users,the RIS,and the D2D users,it is challenging to find a proper solution.Thus,a RIS softmax deep double deterministic(RIS-SD3)policy gradient method is proposed,which can smooth the optimization space as well as reduce the number of local optimizations.Specifically,the SD3 algorithm maximizes the reward of the agent by training the agent to maximize the value function after the softmax operator is introduced.Simulation results show that the proposed RIS-SD3 algorithm can significantly improve the rate of the D2D users while controlling the interference to the cellular user.Moreover,the proposed RIS-SD3 algorithm has better robustness than the twin delayed deep deterministic(TD3)policy gradient algorithm in a dynamic environment. 展开更多
关键词 device-to-device communications reconfigurable intelligent surface deep reinforcement learning softmax deep double deterministic policy gradient
下载PDF
基于TD3-PER的氢燃料电池混合动力汽车能量管理策略研究 被引量:1
10
作者 虞志浩 赵又群 +2 位作者 潘陈兵 何鲲鹏 李丹阳 《汽车技术》 CSCD 北大核心 2024年第1期13-19,共7页
为优化氢燃料电池混合动力汽车的燃料经济性及辅助动力电池性能,提出了一种基于优先经验采样的双延迟深度确定性策略梯度(TD3-PER)能量管理策略。采用双延迟深度确定性策略梯度(TD3)算法,在防止训练过优估计的同时实现了更精准的连续控... 为优化氢燃料电池混合动力汽车的燃料经济性及辅助动力电池性能,提出了一种基于优先经验采样的双延迟深度确定性策略梯度(TD3-PER)能量管理策略。采用双延迟深度确定性策略梯度(TD3)算法,在防止训练过优估计的同时实现了更精准的连续控制;同时结合优先经验采样(PER)算法,在获得更好优化性能的基础上加速了策略的训练。仿真结果表明:相较于深度确定性策略梯度(DDPG)算法,所提出的TD3-PER能量管理策略的百公里氢耗量降低了7.56%,平均功率波动降低了6.49%。 展开更多
关键词 氢燃料电池混合动力汽车 优先经验采样 双延迟深度确定性策略梯度 连续控制
下载PDF
水泥分解炉SNCR脱硝系统的深度强化学习多目标优化控制研究 被引量:1
11
作者 刘定平 吴泽豪 《中国电机工程学报》 EI CSCD 北大核心 2024年第12期4815-4825,I0017,共12页
选择性非催化还原(selective non-catalytic reduction,SNCR)脱硝过程的工艺参数优化可以有效减少水泥分解炉NO_(x)排放和脱硝运行成本。以某水泥分解炉为研究对象,建立基于LightGBM的NO_(x)浓度预测模型,以脱硝成本和NO_(x)浓度最小化... 选择性非催化还原(selective non-catalytic reduction,SNCR)脱硝过程的工艺参数优化可以有效减少水泥分解炉NO_(x)排放和脱硝运行成本。以某水泥分解炉为研究对象,建立基于LightGBM的NO_(x)浓度预测模型,以脱硝成本和NO_(x)浓度最小化为优化目标,采用深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法对水泥分解炉掺烧污泥协同SNCR脱硝过程的相关工艺参数进行优化控制建模。结果表明,NO_(x)浓度预测模型均方根误差(root mean squared error,RMSE)为6.8,平均绝对百分比误差(mean absolute percentage error,MAPE)为3.48%;采用DDPG算法可以对相关工艺参数进行优化,喷氨量和污泥掺烧量分别为427.87 L/h和9.78 t/h时,NO_(x)排放浓度为225.99 mg/(Nm^(3)),脱硝运行成本为1 747.8元/h。该优化结果与其他优化算法结果和常规工况对比,NO_(x)排放浓度和脱硝运行成本均呈现不同程度下降;对模型进行仿真及效果验证可知,所建立模型能输出合理的喷氨量和污泥掺烧量组合,减少SNCR出口NO_(x)浓度波动,有效降低NO_(x)排放浓度和脱硝成本,可实现对SNCR脱硝系统的多目标优化控制。该结果可为基于智能算法的水泥分解炉SNCR脱硝的多目标优化控制设计提供一定参考。 展开更多
关键词 喷氨 污泥掺烧 选择性非催化还原优化控制 LightGBM 强化学习 深度确定性策略梯度
下载PDF
基于DDPG的智能反射面辅助无线携能通信系统性能优化 被引量:1
12
作者 罗丽平 潘伟民 《物联网学报》 2024年第2期46-55,共10页
针对智能反射面(IRS, intelligent reflecting surface)辅助的多输入单输出(MISO, multiple input singleoutput)无线携能通信(SWIPT, simultaneous wireless information and power transfer)系统,考虑基站最大发射功率、IRS反射相移... 针对智能反射面(IRS, intelligent reflecting surface)辅助的多输入单输出(MISO, multiple input singleoutput)无线携能通信(SWIPT, simultaneous wireless information and power transfer)系统,考虑基站最大发射功率、IRS反射相移矩阵的单位膜约束和能量接收器的最小能量约束,以最大化信息传输速率为目标,联合优化了基站处的波束成形向量和智能反射面的反射波束成形向量。为解决非凸优化问题,提出了一种基于深度强化学习的深度确定性策略梯度(DDPG, deep deterministic policy gradient)算法。仿真结果表明,DDPG算法的平均奖励与学习率有关,在选取合适的学习率的条件下,DDPG算法能获得与传统优化算法相近的平均互信息,但运行时间明显低于传统的非凸优化算法,即使增加天线数和反射单元数,DDPG算法依然可以在较短的时间内收敛。这说明DDPG算法能有效地提高计算效率,更适合实时性要求较高的通信业务。 展开更多
关键词 多输入单输出 无线携能通信 智能反射面 波束成形 深度确定性策略梯度
下载PDF
基于深度确定性策略梯度的星地融合网络可拆分任务卸载算法
13
作者 宋晓勤 吴志豪 +4 位作者 赖海光 雷磊 张莉涓 吕丹阳 郑成辉 《通信学报》 EI CSCD 北大核心 2024年第10期116-128,共13页
为解决低轨卫星网络中星地链路任务卸载时延长的问题,提出了一种基于深度确定性策略梯度(DDPG)的星地融合网络可拆分任务卸载算法。针对不同地区用户建立了星地融合网络的多接入边缘计算结构模型,通过应用多智能体DDPG算法,将系统总服... 为解决低轨卫星网络中星地链路任务卸载时延长的问题,提出了一种基于深度确定性策略梯度(DDPG)的星地融合网络可拆分任务卸载算法。针对不同地区用户建立了星地融合网络的多接入边缘计算结构模型,通过应用多智能体DDPG算法,将系统总服务时延最小化的目标转化为智能体奖励收益最大化。在满足子任务卸载约束、服务时延约束等任务卸载约束条件下,优化用户任务拆分比例。仿真结果表明,所提算法在用户服务时延和受益用户数量等方面优于基线算法。 展开更多
关键词 星地融合网络 深度确定性策略梯度 资源分配 多接入边缘计算
下载PDF
一种基于DDPG的变体飞行器智能变形决策方法
14
作者 王青 刘华华 屈东扬 《宇航学报》 EI CAS CSCD 北大核心 2024年第10期1560-1567,共8页
针对一类变体飞行器自主变形决策问题,提出了一种基于深度确定性策略梯度(DDPG)算法的智能变形决策方法。首先,针对一种后掠角可连续变化的飞行器,通过计算流体力学方法获得飞行器的气动参数并分析其气动特性;然后,联合制导过程与DDPG算... 针对一类变体飞行器自主变形决策问题,提出了一种基于深度确定性策略梯度(DDPG)算法的智能变形决策方法。首先,针对一种后掠角可连续变化的飞行器,通过计算流体力学方法获得飞行器的气动参数并分析其气动特性;然后,联合制导过程与DDPG算法,以获得最优气动特性和制导性能为目标,提出了一种变体飞行器智能变形决策算法;最后,仿真结果表明所提算法收敛效果好,相比于固定外形,可通过合适的变形决策指令在得到最优气动外形的同时获得更好的制导性能。 展开更多
关键词 变体飞行器 自主变形决策 深度强化学习 深度确定性策略梯度算法
下载PDF
DDPG深度强化学习算法在无人船目标追踪与救援中的应用
15
作者 宋雷震 吕东芳 《黑龙江大学工程学报(中英俄文)》 2024年第1期58-64,共7页
为保证海上救援活动的高效性,研究结合深度确定性策略梯度算法(Deep Deterministic Policy Gradient,DDPG)从状态空间、动作空间、奖励函数方面对船只追踪救援目标算法进行设计,并实际应用到无人船追踪救援之中。结果显示DDPG算法的稳... 为保证海上救援活动的高效性,研究结合深度确定性策略梯度算法(Deep Deterministic Policy Gradient,DDPG)从状态空间、动作空间、奖励函数方面对船只追踪救援目标算法进行设计,并实际应用到无人船追踪救援之中。结果显示DDPG算法的稳定成功率接近100%,性能优异。该设计的算法最终回合累积奖励值能够稳定在10左右,而平均时长则能稳定在80 s左右,能够根据周边环境的状态调整自己的运动策略,满足海上救援活动中的紧迫性要求,能为相关领域的研究提供一条新的思路。 展开更多
关键词 无人船 目标追踪 海上救援 深度确定性策略梯度算法(DDPG)
下载PDF
基于人工势场DDPG算法的移动机械臂协同避障轨迹规划
16
作者 李勇 张朝兴 柴燎宁 《计算机集成制造系统》 EI CSCD 北大核心 2024年第12期4282-4291,共10页
为了提高移动机械臂在狭窄通道和障碍物约束情况的避障轨迹规划能力,提出一种人工势场法(APF)和深度确定性策略梯度算法(DDPG)结合的改进算法(APF-DDPG)。首先,对机械臂设计了APF规划得到近似姿态,再将研究问题表示为马尔科夫决策过程,... 为了提高移动机械臂在狭窄通道和障碍物约束情况的避障轨迹规划能力,提出一种人工势场法(APF)和深度确定性策略梯度算法(DDPG)结合的改进算法(APF-DDPG)。首先,对机械臂设计了APF规划得到近似姿态,再将研究问题表示为马尔科夫决策过程,设计了状态空间、动作空间和奖惩函数,对规划过程进行阶段性分析处理,设计了一种引导机制来过渡各控制阶段,即避障阶段由DDPG主导训练,目标规划阶段由近似姿态引导DDPG训练,最终获得用于规划的策略模型。最后,建立并设计了固定和随机状态场景的仿真实验,验证了所提算法的有效性。实验结果表明,相较于传统DDPG算法,APF-DDPG算法能够以更高收敛效率训练得到具有更高效控制性能的策略模型。 展开更多
关键词 移动机械臂 避障轨迹规划 人工势场法 深度确定性策略梯度 引导训练
下载PDF
MEC网络中基于深度确定策略梯度的能效优化
17
作者 陈卡 《火力与指挥控制》 CSCD 北大核心 2024年第7期44-49,共6页
移动边缘计算(mobile edge computing,MEC)技术能为用户提供数据处理服务,但MEC服务器的计算资源有限,用户合理地向MEC服务器迁移任务及MEC服务器基于任务要求给用户合理分配资源是提高用户端能效的关键因素。提出基于深度确定策略梯度... 移动边缘计算(mobile edge computing,MEC)技术能为用户提供数据处理服务,但MEC服务器的计算资源有限,用户合理地向MEC服务器迁移任务及MEC服务器基于任务要求给用户合理分配资源是提高用户端能效的关键因素。提出基于深度确定策略梯度的能效优化算法(deep deterministic policy gradient-based energy efficiency opti-mization,DDPG-EEO)。在满足时延要求的前提下,建立关于任务卸载率和资源分配策略的最大化能效的优化问题。再将优化问题描述成马尔可夫决策过程(Markov decision process,MDP),并利用深度确定策略梯度求解。仿真结果表明,DDPG-EEO算法降低了UTs端的能耗,并提高了任务完成率。 展开更多
关键词 移动边缘计算 任务卸载 资源分配 强化学习 深度确定策略梯度
下载PDF
考虑智能网联车辆影响的八车道高速公路施工区可变限速控制方法
18
作者 过秀成 肖哲 +2 位作者 张一鸣 张叶平 许鹏宇 《东南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2024年第2期353-359,共7页
为提升车联网环境下高速公路施工区交通运行效率及安全水平,提出了一种基于强化学习的可变限速控制方法.选取智能驾驶模型和真车试验模型,分别对传统人工车辆和智能网联车辆的跟驰行为进行建模,构建了以瓶颈下游路段交通流量为效率指标... 为提升车联网环境下高速公路施工区交通运行效率及安全水平,提出了一种基于强化学习的可变限速控制方法.选取智能驾驶模型和真车试验模型,分别对传统人工车辆和智能网联车辆的跟驰行为进行建模,构建了以瓶颈下游路段交通流量为效率指标、瓶颈路段速度标准差为安全指标的复合奖励值,利用深度确定性策略梯度算法,分车道动态求解最佳限速值.仿真结果表明,所提可变限速控制方法在不同智能网联车辆渗漏率条件下均能有效提升交通流运行效率和安全水平,且在智能网联车辆渗漏率较低时,提升效果更加显著.当智能网联车辆渗漏率为1.0时,瓶颈下游路段交通流量提升10.1%,瓶颈路段速度标准差均值下降68.9%;当智能网联车辆渗漏率为0时,瓶颈下游路段交通流量提升20.7%,瓶颈路段速度标准差均值下降78.1%.智能网联车辆的引入能够提升至多52.0%的瓶颈下游路段交通流量. 展开更多
关键词 可变限速控制 深度确定性策略梯度算法 八车道高速公路施工区 智能网联车辆 协同自适应巡航控制
下载PDF
基于深度强化学习的智能车辆行为决策研究 被引量:2
19
作者 周恒恒 高松 +2 位作者 王鹏伟 崔凯晨 张宇龙 《科学技术与工程》 北大核心 2024年第12期5194-5203,共10页
自动驾驶车辆决策系统直接影响车辆综合行驶性能,是实现自动驾驶技术需要解决的关键难题之一。基于深度强化学习算法DDPG(deep deterministic policy gradient),针对此问题提出了一种端到端驾驶行为决策模型。首先,结合驾驶员模型选取... 自动驾驶车辆决策系统直接影响车辆综合行驶性能,是实现自动驾驶技术需要解决的关键难题之一。基于深度强化学习算法DDPG(deep deterministic policy gradient),针对此问题提出了一种端到端驾驶行为决策模型。首先,结合驾驶员模型选取自车、道路、干扰车辆等共64维度状态空间信息作为输入数据集对决策模型进行训练,决策模型输出合理的驾驶行为以及控制量,为解决训练测试中的奖励和控制量突变问题,改进DDPG决策模型对决策控制效果进行优化,并在TORCS(the open racing car simulator)平台进行仿真实验验证。结果表明:所提出的决策模型可以根据车辆和环境实时状态信息输出合理的驾驶行为以及控制量,与DDPG模型相比,改进的模型具有更好的控制精度,且车辆横向速度显著减小,车辆舒适性以及车辆稳定性明显改善。 展开更多
关键词 自动驾驶 行为决策 深度强化学习 深度确定性策略梯度算法
下载PDF
自动驾驶路径优化的RF-DDPG车辆控制算法研究 被引量:2
20
作者 焦龙飞 谷志茹 +2 位作者 舒小华 袁鹏 王建斌 《湖南工业大学学报》 2024年第1期62-69,共8页
针对自动驾驶车辆在行使中对目标路径跟踪精度不高、鲁棒性能较差等问题,提出了一种深度确定性策略梯度RF-DDPG(reward function-deep deterministic policy gradient)路径跟踪算法。该算法是在深度强化学习DDPG的基础上,设计DDPG算法... 针对自动驾驶车辆在行使中对目标路径跟踪精度不高、鲁棒性能较差等问题,提出了一种深度确定性策略梯度RF-DDPG(reward function-deep deterministic policy gradient)路径跟踪算法。该算法是在深度强化学习DDPG的基础上,设计DDPG算法的奖励函数,以此优化DDPG的参数,达到所需跟踪精度及稳定性。并且采用aopllo自动驾驶仿真平台,对原始的DDPG算法和改进的RF-DDPG路径跟踪控制算法进行了仿真实验。研究结果表明,所提出的RF-DDPG算法在路径跟踪精度以及鲁棒性能等方面均优于DDPG算法。 展开更多
关键词 自动驾驶 路径跟踪 深度强化学习 路径控制 DDPG算法
下载PDF
上一页 1 2 15 下一页 到第
使用帮助 返回顶部