摘要
目前应用于机械臂控制中有许多不同的算法,如传统的自适应PD控制、模糊自适应控制等,这些大多需要基于数学模型。也有基于强化学习的控制方法,如:DQN(Deep Q Network)、Sarsa等。但这些强化学习算法在连续高维的动作空间中存在学习效率不高、回报奖励设置困难、控制效果不佳等问题。论文对基于PPO(Proximal Policy Optimization近端策略优化)算法实现任意位置的机械臂抓取应用进行研究,并将实验数据与Actor-Critic(演员-评论家)算法的进行对比,验证了使用PPO算法的控制效果良好,学习效率较高且稳定。
In manipulator control,there are many different control methods,such as traditional adaptive PD control and fuzzy adap⁃tive control,which are mostly based on mathematical models.There are also control methods based on reinforcement learning,such as DQN(Deep Q Network),Sarsa,etc.However,these reinforcement learning algorithms have some problems such as low learning efficiency,difficulty in setting rewards,and poor control effect in the continuous high-dimensional action space.According to Prox⁃imal Policy Optimization algorithm,the application of robot arm grasping at any position is studied,and the experimental data is compared with actor-critic algorithm,which proves that the PPO algorithm has good control effect,high learning efficiency and sta⁃bility.
作者
郭坤
武曲
张义
GUO Kun;WU Qu;ZHANG Yi(School of Information and Control Engineering,Qingdao University of Technology,Qingdao 266520,China)
出处
《电脑知识与技术》
2021年第4期222-225,共4页
Computer Knowledge and Technology
基金
山东省自然科学基金资助项目(ZR2017BF043)。