基于改进SAC算法的机械臂运动规划

Motion Planning of Manipulator Based on Improved SoftActor-Critic Algorithm

下载PDF

导出

摘要针对深度强化学习算法在高维状态空间和高精度需求下的机械臂运动规划任务中存在探索效率低、收敛速度慢以及不收敛等问题,文中以SAC(Soft Actor-Critic)算法为基础,引入异步优势机制,提出了一种融合异步优势的AA-SAC(Asynchronous Advantage Soft Actor-Critic)算法。该算法使用Q target网络代替了原V网络,有效降低了Q网络的方差,n个独立的进程可并行训练,提升了训练效率。将AA-SAC算法的经验回放池划分成两个部分,将高质量的经验数据单独存放、单独采样,以提高有效经验数据的利用率。仿真结果表明,AA-SAC算法在收敛速度、成功率和稳定性上表现最优。相较于SAC算法,AA-SAC算法的收敛时间提前了3000回合。收敛后AA-SAC算法的成功率达到了96%,比SAC算法提升了6%,比DDPG(Deep Deterministic Policy Gradient)算法提升了26%。 In view of the problems such as low exploration efficiency,slow convergence speed or even non-convergence of deep reinforcement learning algorithm in the motion planning task of robot arm under the requirement of high dimensional state space and high precision,this study introduces asynchronous advantage mechanism based on SAC(Soft Actor-Critic)algorithm,and proposes an AA-SAC(Asynchronous Advantage Soft Actor-Critic)algorithm integrating asynchronous advantage.This algorithm replaces the original V network with a Q target network,which effectively reduces the variance of the Q network.The n independent processes can be trained in parallel,which improves the training efficiency.The study also divides the experience playback pool of the AA-SAC algorithm into two parts,store and sample high-quality empirical data separately to improve the utilization of effective empirical data.The simulation results show that AA-SAC algorithm has the best performance in convergence speed,success rate and stability.Compared with the SAC algorithm,the convergence time of AA-SAC algorithm is 3000 rounds earlier.After convergence,the success rate of AA-SAC algorithm reaches 96%,which is 6%higher than SAC algorithm and 26%higher than DDPG(Deep Deterministic Policy Gradient)algorithm.

作者唐超张帆 TANG Chao;ZHANG Fan(School of Mechanical and Automotive Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)

机构地区上海工程技术大学机械与汽车工程学院

出处《电子科技》 2024年第11期47-54,共8页 Electronic Science and Technology

基金上海市科委生物医药领域科技支撑计划(17441901200)。

关键词深度强化学习异步优势 SAC算法经验回放池机械臂运动规划微创手术 CoppeliaSim deep reinforcement learning asynchronous advantage SAC algorithm experience playback pool mechanical arm motion planning minimally invasive surgery CoppeliaSim

分类号 TP241 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

1卢钺,王琼,刘顺,李清涛,刘洋,王洪彪.基于强化学习考虑电池损耗的电动汽车充放电控制算法[J].计算机科学,2024,51(S02):1032-1038.
2以基本培训为抓手推动党校高质量发展[J].党史纵横,2024(10).
3郭振君,韩明涛,李井鹏,董文吉.基于强化学习的智慧楼宇机房冷却系统优化算法[J].信息技术与信息化,2024(10):213-216.
4闫晗,芋茶(图).突然被拯救[J].中学生百科,2024(32):18-19.
5都田秀佳,张晓梅.我国3岁以下婴幼儿托育供需区域差异及“十四五”趋势分析——基于“七普”数据预测的2025年人口规模[J].中国国情国力,2024(9):55-60.
6吕鑫峰,郑刚,张旭.基于半监督并行门控CNN-LSTM的微铣削刀具磨损状态监测[J].组合机床与自动化加工技术,2024(10):100-104.
7林建华,费学军,吴杰,黄贤明.一种基于动态最小成本路径启发式算法的水冷壁爬壁机器人路径规划方法[J].无线互联科技,2024,21(20):21-25.
8魏书鑫,王群京,李国丽,许家紫,文彦.萤火虫算法结合遗传算法的移动机器人路径规划[J].制造业自动化,2024,46(10):69-82.
9于广宇,窦树超.基于共建共治共享建设目标的社区规划理论及实施方法探索[J].城市建设理论研究（电子版）,2024(30):220-222.
10陈娟,王阳,吴宗玲,陈鹏,张逢春,郝俊峰.基于深度强化学习的云边协同任务迁移与资源再分配优化研究[J].计算机科学,2024,51(S02):703-712.

电子科技

2024年第11期

浏览历史

内容加载中请稍等...

基于改进SAC算法的机械臂运动规划

相关作者

相关机构

相关主题

浏览历史