摘要
利用强化学习进行机械臂智能控制时,通过机械臂对环境的自由探索,并根据环境反馈的奖励值来训练控制器,从而实现机械臂自主感知及决策的功能。但是,没有约束的自由探索会产生无效动作,从而导致训练周期过长、收敛速度慢的问题。本文提出了一种基于奖励与策略双优化的机械臂控制算法(Hybird Reward Generative Adversarial Imitation Learning,HR-GAIL)。奖励方面,基于本文改进的鉴别器,结合任务奖励与模仿奖励,构建了复合奖励函数。策略方面,结合鉴别器与策略网络构建了二元变量损失函数,在奖励与策略交替优化的过程中实现对控制器的更新。最后,通过在Pybullet环境中搭建Panda机械臂,并实施抓取及移动物块的仿真任务来验证本文算法的效果。仿真结果表明,在相同的仿真任务下,HR-GAIL比GAIL+SAC的完成时间缩短16%,抓取成功率提高5%,训练鉴别器速度与抓取稳定性得到了提升。
When reinforcement learning is applied in the field of intelligent control of manipulator,it can achieve the autonomous perception and decision-making function of manipulator by training the controller based on the free exploration of the environment and the reward value of the environment feedback.However,unconstrained free exploration will lead to ineffective action,which leads to long training time and slow convergence.To solve the above problems,a Hybird Reward Generative Adversarial Imitation Learning(HR-GAIL)based on reward and policy dual optimization was proposed.In terms of reward,based on the improved discriminator,a compound reward function was constructed by combining task reward and imitation reward.In terms of policy,a binary variable loss function was constructed by combining discriminator and policy network,and the controller was updated in the process of alternating reward and policy optimization.Finally,in the Pybullet environment,a Panda arm was built to carry out the simulation task of grabbing and moving objects to verify the effect of the proposed algorithm.The simulation results show that under the same simulation task,the completion time of HR-GAIL is 16%shorter than that of GAIL+SAC,the grasping success rate is 5%higher,and the training discriminator speed and grasping stability are improved.
作者
申珅
曾建潮
秦品乐
SHEN Shen;ZENG Jianchao;QIN Pinle(School of Electrical and Control Engineering,North University of China,Taiyuan 030051,China;School of Computer Science and Technology,North University of China,Taiyuan 030051,China)
出处
《中北大学学报(自然科学版)》
CAS
2023年第6期616-623,共8页
Journal of North University of China(Natural Science Edition)
关键词
强化学习
奖励函数
机械臂控制
算法优化
reinforcement learning
reward function
manipulator control
algorithm optimization