期刊文献+
共找到32篇文章
< 1 2 >
每页显示 20 50 100
Perception Enhanced Deep Deterministic Policy Gradient for Autonomous Driving in Complex Scenarios
1
作者 Lyuchao Liao Hankun Xiao +3 位作者 Pengqi Xing Zhenhua Gan Youpeng He Jiajun Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期557-576,共20页
Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonom... Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data. 展开更多
关键词 Autonomous driving traffic roundabouts deep deterministic policy gradient spatial attention mechanisms
下载PDF
基于Policy Gradient的机械臂运动跟踪控制器参数整定 被引量:3
2
作者 韩霖骁 胡剑波 +3 位作者 宋仕元 王应洋 贺子厚 张鹏 《系统工程与电子技术》 EI CSCD 北大核心 2021年第9期2605-2611,共7页
针对机械臂运动跟踪控制器的参数自整定问题,设计了一种基于强化学习Policy Gradient法的参数整定器。首先,介绍了机械臂的一种混合动力学模型,根据该系统模型进行了比例微分(proportional-derivative,PD)控制器设计和李雅普诺夫稳定性... 针对机械臂运动跟踪控制器的参数自整定问题,设计了一种基于强化学习Policy Gradient法的参数整定器。首先,介绍了机械臂的一种混合动力学模型,根据该系统模型进行了比例微分(proportional-derivative,PD)控制器设计和李雅普诺夫稳定性证明,并由此给出了参数矩阵的范围。其次,设计了基于Policy Gradient的参数整定器,通过引入积分器的方法对其进行改进,使其控制下的参数行为连续化以进一步提高PD控制器的控制效果。最后,以二阶机械臂系统为例进行了仿真验证。实验数据证明了该参数整定器的有效性和可行性,并能有效提升系统的动态性能。 展开更多
关键词 机械臂 运动跟踪 policy gradient 参数整定 比例微分控制
下载PDF
Optimizing the Multi-Objective Discrete Particle Swarm Optimization Algorithm by Deep Deterministic Policy Gradient Algorithm
3
作者 Sun Yang-Yang Yao Jun-Ping +2 位作者 Li Xiao-Jun Fan Shou-Xiang Wang Zi-Wei 《Journal on Artificial Intelligence》 2022年第1期27-35,共9页
Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains ... Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO. 展开更多
关键词 Deep deterministic policy gradient multi-objective discrete particle swarm optimization deep reinforcement learning machine learning
下载PDF
A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control 被引量:2
4
作者 DONG Xiang ZHANG Jing +3 位作者 CHENG Long XU WenJun SU Hang MEI Tao 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2022年第10期2409-2419,共11页
The soft continuum arm has extensive application in industrial production and human life due to its superior safety and flexibility. Reinforcement learning is a powerful technique for solving soft arm continuous contr... The soft continuum arm has extensive application in industrial production and human life due to its superior safety and flexibility. Reinforcement learning is a powerful technique for solving soft arm continuous control problems, which can learn an effective control policy with an unknown system model. However, it is often affected by the high sample complexity and requires huge amounts of data to train, which limits its effectiveness in soft arm control. An improved policy gradient method, policy gradient integrating long and short-term rewards denoted as PGLS, is proposed in this paper to overcome this issue. The shortterm rewards provide more dynamic-aware exploration directions for policy learning and improve the exploration efficiency of the algorithm. PGLS can be integrated into current policy gradient algorithms, such as deep deterministic policy gradient(DDPG). The overall control framework is realized and demonstrated in a dynamics simulation environment. Simulation results show that this approach can effectively control the soft arm to reach and track the targets. Compared with DDPG and other model-free reinforcement learning algorithms, the proposed PGLS algorithm has a great improvement in convergence speed and performance. In addition, a fluid-driven soft manipulator is designed and fabricated in this paper, which can verify the proposed PGLS algorithm in real experiments in the future. 展开更多
关键词 soft arm control Cosserat rod deep reinforcement learning policy gradient algorithm high sample complexity
原文传递
基于深度强化学习的IRS辅助NOMA-MEC通信资源分配优化
5
作者 方娟 刘珍珍 +1 位作者 陈思琪 李硕朋 《北京工业大学学报》 CAS CSCD 北大核心 2024年第8期930-938,共9页
为了解决无法与边缘服务器建立直连通信链路的盲区边缘用户卸载任务的问题,设计了一个基于深度强化学习(deep reinforcement learning, DRL)的智能反射面(intelligent reflecting surface, IRS)辅助非正交多址(non-orthogonal multiple ... 为了解决无法与边缘服务器建立直连通信链路的盲区边缘用户卸载任务的问题,设计了一个基于深度强化学习(deep reinforcement learning, DRL)的智能反射面(intelligent reflecting surface, IRS)辅助非正交多址(non-orthogonal multiple access, NOMA)通信的资源分配优化算法,以获得由系统和速率和能源效率(energy efficiency, EE)加权的最大系统收益,从而实现绿色高效通信。通过深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法联合优化传输功率分配和IRS的反射相移矩阵。仿真结果表明,使用DDPG算法处理移动边缘计算(mobile edge computing, MEC)的通信资源分配优于其他几种对比实验算法。 展开更多
关键词 非正交多址(non-orthogonal multiple access NOMA) 智能反射面(intelligent reflecting surface IRS) 深度确定性策略梯度(deep deterministic policy gradient DDPG)算法 移动边缘计算(mobile edge computing MEC) 能源效率(energy efficiency EE) 系统收益
下载PDF
A UAV collaborative defense scheme driven by DDPG algorithm
6
作者 ZHANG Yaozhong WU Zhuoran +1 位作者 XIONG Zhenkai CHEN Long 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第5期1211-1224,共14页
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ... The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process. 展开更多
关键词 deep deterministic policy gradient(DDPG)algorithm unmanned aerial vehicles(UAVs)swarm task decision making deep reinforcement learning sparse reward problem
下载PDF
Deep reinforcement learning based task offloading in blockchain enabled smart city
7
作者 金凯琦 WU Wenjun +2 位作者 GAO Yang YIN Yufen SI Pengbo 《High Technology Letters》 EI CAS 2023年第3期295-304,共10页
With the expansion of cities and emerging complicated application,smart city has become an in-telligent management mechanism.In order to guarantee the information security and quality of service(QoS)of the Internet of... With the expansion of cities and emerging complicated application,smart city has become an in-telligent management mechanism.In order to guarantee the information security and quality of service(QoS)of the Internet of Thing(IoT)devices in the smart city,a mobile edge computing(MEC)en-abled blockchain system is considered as the smart city scenario where the offloading process of com-puting tasks is a key aspect infecting the system performance in terms of service profit and latency.The task offloading process is formulated as a Markov decision process(MDP)and the optimal goal is the cumulative profit for the offloading nodes considering task profit and service latency cost,under the restriction of system timeout as well as processing resource.Then,a policy gradient based task of-floading(PG-TO)algorithm is proposed to solve the optimization problem.Finally,the numerical re-sult shows that the proposed PG-TO has better performance than the comparison algorithm,and the system performance as well as QoS is analyzed respectively.The testing result indicates that the pro-posed method has good generalization. 展开更多
关键词 mobile edge computing(MEC) blockchain policy gradient task offloading
下载PDF
RIS-Assisted UAV-D2D Communications Exploiting Deep Reinforcement Learning
8
作者 YOU Qian XU Qian +2 位作者 YANG Xin ZHANG Tao CHEN Ming 《ZTE Communications》 2023年第2期61-69,共9页
Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interferenc... Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interference caused by the line-of-sight(LoS)airto-ground channels,we deploy a reconfigurable intelligent surface(RIS)to rebuild the wireless channels.A joint optimization problem of the transmit power of UAV,the transmit power of D2D users and the RIS phase configuration are investigated to maximize the achievable rate of D2D users while satisfying the quality of service(QoS)requirement of cellular users.Due to the high channel dynamics and the coupling among cellular users,the RIS,and the D2D users,it is challenging to find a proper solution.Thus,a RIS softmax deep double deterministic(RIS-SD3)policy gradient method is proposed,which can smooth the optimization space as well as reduce the number of local optimizations.Specifically,the SD3 algorithm maximizes the reward of the agent by training the agent to maximize the value function after the softmax operator is introduced.Simulation results show that the proposed RIS-SD3 algorithm can significantly improve the rate of the D2D users while controlling the interference to the cellular user.Moreover,the proposed RIS-SD3 algorithm has better robustness than the twin delayed deep deterministic(TD3)policy gradient algorithm in a dynamic environment. 展开更多
关键词 device-to-device communications reconfigurable intelligent surface deep reinforcement learning softmax deep double deterministic policy gradient
下载PDF
A Collaborative Machine Learning Scheme for Traffic Allocation and Load Balancing for URLLC Service in 5G and Beyond
9
作者 Andreas G. Papidas George C. Polyzos 《Journal of Computer and Communications》 2023年第11期197-207,共11页
Key challenges for 5G and Beyond networks relate with the requirements for exceptionally low latency, high reliability, and extremely high data rates. The Ultra-Reliable Low Latency Communication (URLLC) use case is t... Key challenges for 5G and Beyond networks relate with the requirements for exceptionally low latency, high reliability, and extremely high data rates. The Ultra-Reliable Low Latency Communication (URLLC) use case is the trickiest to support and current research is focused on physical or MAC layer solutions, while proposals focused on the network layer using Machine Learning (ML) and Artificial Intelligence (AI) algorithms running on base stations and User Equipment (UE) or Internet of Things (IoT) devices are in early stages. In this paper, we describe the operation rationale of the most recent relevant ML algorithms and techniques, and we propose and validate ML algorithms running on both cells (base stations/gNBs) and UEs or IoT devices to handle URLLC service control. One ML algorithm runs on base stations to evaluate latency demands and offload traffic in case of need, while another lightweight algorithm runs on UEs and IoT devices to rank cells with the best URLLC service in real-time to indicate the best one cell for a UE or IoT device to camp. We show that the interplay of these algorithms leads to good service control and eventually optimal load allocation, under slow load mobility. . 展开更多
关键词 5G and B5G Networks Ultra Reliable Low Latency Communications (URLLC) Machine Learning (ML) for 5G Temporal Difference Methods (TDM) Monte Carlo Methods policy gradient Methods
下载PDF
Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs 被引量:12
10
作者 LI Yue QIU Xiaohui +1 位作者 LIU Xiaodong XIA Qunli 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第4期734-742,共9页
The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies wh... The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies while increasing the tactical capabilities of combat aircraft. As a research object, common UCAV uses the neural network fitting strategy to obtain values of attack areas. However, this simple strategy cannot cope with complex environmental changes and autonomously optimize decision-making problems. To solve the problem, this paper proposes a new deep deterministic policy gradient(DDPG) strategy based on deep reinforcement learning for the attack area fitting of UCAVs in the future battlefield. Simulation results show that the autonomy and environmental adaptability of UCAVs in the future battlefield will be improved based on the new DDPG algorithm and the training process converges quickly. We can obtain the optimal values of attack areas in real time during the whole flight with the well-trained deep network. 展开更多
关键词 attack area neural network deep deterministic policy gradient(DDPG) unmanned combat aerial vehicle(UCAV)
下载PDF
Distributed optimization of electricity-Gas-Heat integrated energy system with multi-agent deep reinforcement learning 被引量:3
11
作者 Lei Dong Jing Wei +1 位作者 Hao Lin Xinying Wang 《Global Energy Interconnection》 EI CAS CSCD 2022年第6期604-617,共14页
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co... The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents. 展开更多
关键词 Integrated energy system Multi-agent system Distributed optimization Multi-agent deep deterministic policy gradient Real-time optimization decision
下载PDF
Moving target defense of routing randomization with deep reinforcement learning against eavesdropping attack 被引量:2
12
作者 Xiaoyu Xu Hao Hu +3 位作者 Yuling Liu Jinglei Tan Hongqi Zhang Haotian Song 《Digital Communications and Networks》 SCIE CSCD 2022年第3期373-387,共15页
Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harm... Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harmful attacks. Routing randomization is a relevant research direction for moving target defense, which has been proven to be an effective method to resist eavesdropping attacks. To counter eavesdropping attacks, in this study, we analyzed the existing routing randomization methods and found that their security and usability need to be further improved. According to the characteristics of eavesdropping attacks, which are “latent and transferable”, a routing randomization defense method based on deep reinforcement learning is proposed. The proposed method realizes routing randomization on packet-level granularity using programmable switches. To improve the security and quality of service of legitimate services in networks, we use the deep deterministic policy gradient to generate random routing schemes with support from powerful network state awareness. In-band network telemetry provides real-time, accurate, and comprehensive network state awareness for the proposed method. Various experiments show that compared with other typical routing randomization defense methods, the proposed method has obvious advantages in security and usability against eavesdropping attacks. 展开更多
关键词 Routing randomization Moving target defense Deep reinforcement learning Deep deterministic policy gradient
下载PDF
An Initial Residual Stress Inference Method by Incorporating Monitoring Data and Mechanism Model
13
作者 Shuguo Wang Yingguang Li +1 位作者 Changqing Liu Zhiwei Zhao 《Chinese Journal of Mechanical Engineering》 SCIE EI CAS CSCD 2022年第5期47-65,共19页
Initial residual stress is the main reason causing machining deformation of the workpiece,which has been deemed as one of the most important aspects of machining quality issues.The inference of the distribution of ini... Initial residual stress is the main reason causing machining deformation of the workpiece,which has been deemed as one of the most important aspects of machining quality issues.The inference of the distribution of initial residual stress inside the blank has significant meaning for machining deformation control.Due to the principle error of existing residual stress detection methods,there are still challenges in practical applications.Aiming at the detection problem of the initial residual stress field,an initial residual stress inference method by incorporating monitoring data and mechanism model is proposed in this paper.Monitoring data during machining process is used to represent the macroscopic characterization of the unbalanced residual stress,and the finite element numerical model is used as the mechanism model so as to solve the problem that the analytic mechanism model is difficult to establish;the policy gradient approach is introduced to solve the gradient descent problem of the combination of learning model and mechanism model.Finally,the initial residual stress field is obtained through iterative calculation based on the fusing method of monitoring data and mechanism model.Verification results show that the proposed inference method of initial residual stress field can accurately and effectively reflect the machining deformation in the actual machining process. 展开更多
关键词 Initial residual stress INFERENCE Monitoring data Mechanism model policy gradient
下载PDF
Low Carbon Economic Dispatch of Integrated Energy System Considering Power Supply Reliability and Integrated Demand Response
14
作者 Jian Dong Haixin Wang +3 位作者 Junyou Yang Liu Gao Kang Wang Xiran Zhou 《Computer Modeling in Engineering & Sciences》 SCIE EI 2022年第7期319-340,共22页
Integrated energy system optimization scheduling can improve energy efficiency and low carbon economy.This paper studies an electric-gas-heat integrated energy system,including the carbon capture system,energy couplin... Integrated energy system optimization scheduling can improve energy efficiency and low carbon economy.This paper studies an electric-gas-heat integrated energy system,including the carbon capture system,energy coupling equipment,and renewable energy.An energy scheduling strategy based on deep reinforcement learning is proposed to minimize operation cost,carbon emission and enhance the power supply reliability.Firstly,the lowcarbon mathematical model of combined thermal and power unit,carbon capture system and power to gas unit(CCP)is established.Subsequently,we establish a low carbon multi-objective optimization model considering system operation cost,carbon emissions cost,integrated demand response,wind and photovoltaic curtailment,and load shedding costs.Furthermore,considering the intermittency of wind power generation and the flexibility of load demand,the low carbon economic dispatch problem is modeled as a Markov decision process.The twin delayed deep deterministic policy gradient(TD3)algorithm is used to solve the complex scheduling problem.The effectiveness of the proposed method is verified in the simulation case studies.Compared with TD3,SAC,A3C,DDPG and DQN algorithms,the operating cost is reduced by 8.6%,4.3%,6.1%and 8.0%. 展开更多
关键词 Integrated energy system twin delayed deep deterministic policy gradient economic dispatch power supply reliability integrated demand response
下载PDF
Deep reinforcement learning for online scheduling of photovoltaic systems with battery energy storage systems
15
作者 Yaze Li Jingxian Wu Yanjun Pan 《Intelligent and Converged Networks》 EI 2024年第1期28-41,共14页
A new online scheduling algorithm is proposed for photovoltaic(PV)systems with battery-assisted energy storage systems(BESS).The stochastic nature of renewable energy sources necessitates the employment of BESS to bal... A new online scheduling algorithm is proposed for photovoltaic(PV)systems with battery-assisted energy storage systems(BESS).The stochastic nature of renewable energy sources necessitates the employment of BESS to balance energy supplies and demands under uncertain weather conditions.The proposed online scheduling algorithm aims at minimizing the overall energy cost by performing actions such as load shifting and peak shaving through carefully scheduled BESS charging/discharging activities.The scheduling algorithm is developed by using deep deterministic policy gradient(DDPG),a deep reinforcement learning(DRL)algorithm that can deal with continuous state and action spaces.One of the main contributions of this work is a new DDPG reward function,which is designed based on the unique behaviors of energy systems.The new reward function can guide the scheduler to learn the appropriate behaviors of load shifting and peak shaving through a balanced process of exploration and exploitation.The new scheduling algorithm is tested through case studies using real world data,and the results indicate that it outperforms existing algorithms such as Deep Q-learning.The online algorithm can efficiently learn the behaviors of optimum non-casual off-line algorithms. 展开更多
关键词 photovoltaic(PV) battery energy storage system(BESS) Markov decision process(MDP) deep deterministic policy gradient(DDPG)
原文传递
A novel movies recommendation algorithm based on reinforcement learning with DDPG policy 被引量:1
16
作者 Qiaoling Zhou 《International Journal of Intelligent Computing and Cybernetics》 EI 2020年第1期67-79,共13页
Purpose-English original movies played an important role in English learning and communication.In order to find the required movies for us from a large number of English original movies and reviews,this paper proposed... Purpose-English original movies played an important role in English learning and communication.In order to find the required movies for us from a large number of English original movies and reviews,this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies.In fact,although the conventional movies recommendation algorithms have solved the problem of information overload,they still have their limitations in the case of cold start-up and sparse data.Design/methodology/approach-To solve the aforementioned problems of conventional movies recommendation algorithms,this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning,which uses the deep deterministic policy gradient(DDPG)algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one.Meanwhile,a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.Findings-In order to verify the feasibility and validity of the proposed algorithm,the state of the art and the proposed algorithm are compared in indexes of RMSE,recall rate and accuracy based on the MovieLens English original movie data set for the experiments.Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.Originality/value-Applying the proposed algorithm to recommend English original movies,DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data. 展开更多
关键词 Reinforcement learning Deep deterministic policy gradient English original movies Movies recommendation Cold start
原文传递
On‑Ramp Merging for Highway Autonomous Driving:An Application of a New Safety Indicator in Deep Reinforcement Learning 被引量:2
17
作者 Guofa Li Weiyan Zhou +2 位作者 Siyan Lin Shen Li Xingda Qu 《Automotive Innovation》 EI CSCD 2023年第3期453-465,共13页
This paper proposes an improved decision-making method based on deep reinforcement learning to address on-ramp merging challenges in highway autonomous driving.A novel safety indicator,time difference to merging(TDTM)... This paper proposes an improved decision-making method based on deep reinforcement learning to address on-ramp merging challenges in highway autonomous driving.A novel safety indicator,time difference to merging(TDTM),is introduced,which is used in conjunction with the classic time to collision(TTC)indicator to evaluate driving safety and assist the merging vehicle in finding a suitable gap in traffic,thereby enhancing driving safety.The training of an autonomous driving agent is performed using the Deep Deterministic Policy Gradient(DDPG)algorithm.An action-masking mechanism is deployed to prevent unsafe actions during the policy exploration phase.The proposed DDPG+TDTM+TTC solution is tested in on-ramp merging scenarios with different driving speeds in SUMO and achieves a success rate of 99.96%without significantly impacting traffic efficiency on the main road.The results demonstrate that DDPG+TDTM+TTC achieved a higher on-ramp merging success rate of 99.96%compared to DDPG+TTC and DDPG. 展开更多
关键词 Autonomous driving On-ramp merging Deep reinforcement learning Action-masking mechanism Deep Deterministic policy gradient(DDPG)
原文传递
Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys
18
作者 Tianbo DENG Hao HUANG +2 位作者 Yangwang FANG Jie YAN Haoyu CHENG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第12期309-324,共16页
In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue... In this paper,a missile terminal guidance law based on a new Deep Deterministic Policy Gradient(DDPG)algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy.First,to deal with the issue that the missile cannot accurately distinguish the target from the decoy,the energy center method is employed to obtain the equivalent energy center(called virtual target)of the target and decoy,and the model for the missile and the virtual decoy is established.Then,an improved DDPG algorithm is proposed based on a trusted-search strategy,which significantly increases the train efficiency of the previous DDPG algorithm.Furthermore,combining the established model,the network obtained by the improved DDPG algorithm and the reward function,an intelligent missile terminal guidance scheme is proposed.Specifically,a heuristic reward function is designed for training and learning in combat scenarios.Finally,the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests,and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance. 展开更多
关键词 Deep deterministic policy gradient Infrared decoy Maneuvering target Reinforcement learning Terminal guidance law
原文传递
A DDPG-based solution for optimal consensus of continuous-time linear multi-agent systems
19
作者 LI Ye LIU ZhongXin +2 位作者 LAN Ge SADER Malika CHEN ZengQiang 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2023年第8期2441-2453,共13页
Modeling a system in engineering applications is a time-consuming and labor-intensive task,as system parameters may change with temperature,component aging,etc.In this paper,a novel data-driven model-free optimal cont... Modeling a system in engineering applications is a time-consuming and labor-intensive task,as system parameters may change with temperature,component aging,etc.In this paper,a novel data-driven model-free optimal controller based on deep deterministic policy gradient(DDPG)is proposed to address the problem of continuous-time leader-following multi-agent consensus.To deal with the problem of the dimensional explosion of state space and action space,two different types of neural nets are utilized to fit them instead of the time-consuming state iteration process.With minimal energy consumption,the proposed controller achieves consensus only based on the consensus error and does not require any initial admissible policies.Besides,the controller is self-learning,which means it can achieve optimal control by learning in real time as the system parameters change.Finally,the proofs of convergence and stability,as well as some simulation experiments,are provided to verify the algorithm’s effectiveness. 展开更多
关键词 leader-following consensus optimal control reinforcement learning deep deterministic policy gradient(DDPG)
原文传递
Deep Reinforcement Learning Enabled Bi-level Robust Parameter Optimization of Hydropower-dominated Systems for Damping Ultra-low Frequency Oscillation
20
作者 Guozhou Zhang Junbo Zhao +4 位作者 Weihao Hu Di Cao Nan Duan Zhe Chen Frede Blaabjerg 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2023年第6期1770-1783,共14页
This paper proposes a robust and computationally efficient control method for damping ultra-low frequency oscillations(ULFOs) in hydropower-dominated systems. Unlike the existing robust optimization based control form... This paper proposes a robust and computationally efficient control method for damping ultra-low frequency oscillations(ULFOs) in hydropower-dominated systems. Unlike the existing robust optimization based control formulation that can only deal with a limited number of operating conditions, the proposed method reformulates the control problem into a bi-level robust parameter optimization model. This allows us to consider a wide range of system operating conditions. To speed up the bi-level optimization process, the deep deterministic policy gradient(DDPG) based deep reinforcement learning algorithm is developed to train an intelligent agent. This agent can provide very fast lower-level decision variables for the upper-level model, significantly enhancing its computational efficiency. Simulation results demonstrate that the proposed method can achieve much better damping control performance than other alternatives with slightly degraded dynamic response performance of the governor under various types of operating conditions. 展开更多
关键词 Bi-level robust parameter optimization deep reinforcement learning deep deterministic policy gradient ultralow frequency oscillation damping control stability
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部