摘要
深度确定性策略梯度算法(Deep Deterministic Policy Gradient,DDPG)在连续控制问题中具有良好的表现。针对二连杆机械臂的运动控制问题,提出了一种基于深度确定性策略梯度算法的控制方法。为了提升模型的收敛速度,结合了多目标学习的方法,使机械臂可以从已到达的位置获取奖励,同时还改进了DDPG算法的经验回放机制,根据样本的重要性程度分类存储,网络模型训练时优先选择重要性程度高的样本。实验结果表明,结合多目标学习方法和分类经验回放方法的DDPG算法具有更好的性能。
The Deep Deterministic Policy Gradient(DDPG)algorithm achieves good performance in continuous control problems.A control method based on a depth-deterministic policy gradient algorithm is proposed for the motion control problem of a two-linked robotic arm.In order to improve the convergence speed of the model,a multi-objective learning method is combined so that the robotic arm can obtain rewards from the reached positions,and the experience replay method of the DDPG algorithm is also improved to store the samples according to their importance degree in classification,and the samples with high importance degree are selected in preference when the network model is trained.The experimental results show that the DDPG algorithm combining the multi-objective learning method and the classification experience replay method has better performance.
作者
陈奎烨
葛群峰
高兴波
陈路
CHEN Kui-ye;GE Qun-feng;GAO Xing-bo;CHEN Lu(Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo 315211,China)
出处
《无线通信技术》
2021年第3期17-22,共6页
Wireless Communication Technology