期刊文献+

基于改进SAC算法的机械臂运动规划

Motion Planning of Manipulator Based on Improved SoftActor-Critic Algorithm
下载PDF
导出
摘要 针对深度强化学习算法在高维状态空间和高精度需求下的机械臂运动规划任务中存在探索效率低、收敛速度慢以及不收敛等问题,文中以SAC(Soft Actor-Critic)算法为基础,引入异步优势机制,提出了一种融合异步优势的AA-SAC(Asynchronous Advantage Soft Actor-Critic)算法。该算法使用Q target网络代替了原V网络,有效降低了Q网络的方差,n个独立的进程可并行训练,提升了训练效率。将AA-SAC算法的经验回放池划分成两个部分,将高质量的经验数据单独存放、单独采样,以提高有效经验数据的利用率。仿真结果表明,AA-SAC算法在收敛速度、成功率和稳定性上表现最优。相较于SAC算法,AA-SAC算法的收敛时间提前了3000回合。收敛后AA-SAC算法的成功率达到了96%,比SAC算法提升了6%,比DDPG(Deep Deterministic Policy Gradient)算法提升了26%。 In view of the problems such as low exploration efficiency,slow convergence speed or even non-convergence of deep reinforcement learning algorithm in the motion planning task of robot arm under the requirement of high dimensional state space and high precision,this study introduces asynchronous advantage mechanism based on SAC(Soft Actor-Critic)algorithm,and proposes an AA-SAC(Asynchronous Advantage Soft Actor-Critic)algorithm integrating asynchronous advantage.This algorithm replaces the original V network with a Q target network,which effectively reduces the variance of the Q network.The n independent processes can be trained in parallel,which improves the training efficiency.The study also divides the experience playback pool of the AA-SAC algorithm into two parts,store and sample high-quality empirical data separately to improve the utilization of effective empirical data.The simulation results show that AA-SAC algorithm has the best performance in convergence speed,success rate and stability.Compared with the SAC algorithm,the convergence time of AA-SAC algorithm is 3000 rounds earlier.After convergence,the success rate of AA-SAC algorithm reaches 96%,which is 6%higher than SAC algorithm and 26%higher than DDPG(Deep Deterministic Policy Gradient)algorithm.
作者 唐超 张帆 TANG Chao;ZHANG Fan(School of Mechanical and Automotive Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)
出处 《电子科技》 2024年第11期47-54,共8页 Electronic Science and Technology
基金 上海市科委生物医药领域科技支撑计划(17441901200)。
关键词 深度强化学习 异步优势 SAC算法 经验回放池 机械臂 运动规划 微创手术 CoppeliaSim deep reinforcement learning asynchronous advantage SAC algorithm experience playback pool mechanical arm motion planning minimally invasive surgery CoppeliaSim
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部