期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control 被引量:2
1
作者 DONG Xiang ZHANG Jing +3 位作者 CHENG Long XU WenJun SU Hang MEI Tao 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2022年第10期2409-2419,共11页
The soft continuum arm has extensive application in industrial production and human life due to its superior safety and flexibility. Reinforcement learning is a powerful technique for solving soft arm continuous contr... The soft continuum arm has extensive application in industrial production and human life due to its superior safety and flexibility. Reinforcement learning is a powerful technique for solving soft arm continuous control problems, which can learn an effective control policy with an unknown system model. However, it is often affected by the high sample complexity and requires huge amounts of data to train, which limits its effectiveness in soft arm control. An improved policy gradient method, policy gradient integrating long and short-term rewards denoted as PGLS, is proposed in this paper to overcome this issue. The shortterm rewards provide more dynamic-aware exploration directions for policy learning and improve the exploration efficiency of the algorithm. PGLS can be integrated into current policy gradient algorithms, such as deep deterministic policy gradient(DDPG). The overall control framework is realized and demonstrated in a dynamics simulation environment. Simulation results show that this approach can effectively control the soft arm to reach and track the targets. Compared with DDPG and other model-free reinforcement learning algorithms, the proposed PGLS algorithm has a great improvement in convergence speed and performance. In addition, a fluid-driven soft manipulator is designed and fabricated in this paper, which can verify the proposed PGLS algorithm in real experiments in the future. 展开更多
关键词 soft arm control Cosserat rod deep reinforcement learning policy gradient algorithm high sample complexity
原文传递
A UAV collaborative defense scheme driven by DDPG algorithm
2
作者 ZHANG Yaozhong WU Zhuoran +1 位作者 XIONG Zhenkai CHEN Long 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第5期1211-1224,共14页
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ... The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process. 展开更多
关键词 deep deterministic policy gradient(DDPG)algorithm unmanned aerial vehicles(UAVs)swarm task decision making deep reinforcement learning sparse reward problem
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部