期刊文献+

基于概率轨迹匹配的机器人模仿学习方法 被引量:1

Robot Imitation Learning Method Based on Trajectory Probability Matching
下载PDF
导出
摘要 模仿学习是机器人仿生机制研究的主要内容之一,即通过观察、理解、学习、模仿示教行为实现机器人的仿生特性;基于高斯过程分别表达采集离散示教信号所构成的示教轨迹和含有未知参数策略的模仿轨迹,构建模仿学习方法框架,将概率模型匹配引入到模仿学习中,以KL散度为代价函数比较两种轨迹的概率分布,运用梯度下降法寻求使KL散度最小的最优模仿控制策略,将策略应用于模仿机器人以完成与示教相同的模仿任务;以关节型机器人的机械臂摆动行为模仿为学习任务进行仿真,结果表明基于概率轨迹匹配的模仿学习方法能够实现机械臂摆动行为模仿,学习过程较传统方法简易且学习效果较好。 Imitation learning is an important means of bio-robot to quickly learn new skills and methods,that is,through observation,understanding,learning,imitating the teaching behavior to achieve bionic robot.A method framework is proposed to introduce the probabilistic matching model into imitation learning,that gaussian process were shown to express teach trajectory which was composed by discrete teach signal,and imitation trajectory with unknown parameters.Then compare the probability distribution of the two trajectories,seek the optimal control strategy----the policy,by minimizing the KL divergence to make use of gradient descent,finally applied the policy to the imitative robot for completing the teaching task.The essential part of the joint type robot,mechanical arm,is used to be the imitate model.The simulation results of imitating the swing behavior demonstrate the effectiveness of the imitation learning method based on trajectory probability matching.The learning process is more simple and learning effect is better than the traditional methods.
出处 《计算机测量与控制》 2015年第11期3713-3716,3720,共5页 Computer Measurement &Control
基金 国家自然科学基金项目(61375086) 国家自然科学基金项目(61075110) 高等学校博士学科点专项科研基金资助课题(20101103110007)
关键词 模仿学习 概率模型 轨迹匹配 高斯过程 控制策略 imitation learning probability model trajectory matching Gaussian process control policy
  • 相关文献

参考文献16

  • 1Sammut C A, Webb G I, Behavioral Cloning, Encyclopaedia of Ma- chine Learning [M]. edn. 1st edition, Springer, New York, 2010: 93-97.
  • 2Pastor P, Hoffmann H, et all Learning and generalization of motor skills by learning from demonstration [A]. Robotics and Automa- tion, 2009. ICRA'09. IEEE International Conference [C] . IEEE, 2009: 763-768. conference on artifical intelligence [C] . Pasade-.
  • 3Gergely Neu and Csaba Szepesvari. Apprenticeship learning using inverse reinforcement Learning and gradient methods [A]. In Pro ceedings of the 23rd Conference on Uncertainty in Artificial Intelli gence (UAI) [C] . Vancouver, BC, Canada, 2007: 295-302.
  • 4Jaedeug Choi, Kee-Eung Kim, Inverse reinforcement learning in partially observable environments [A]. Proceedings of the 21st in- ternational joint na, California, USA, 2009: 1028-1033.
  • 5Morimura T, E. Uchibe, Yoshimoto J, Peters et al. Derivatives of logarithmic stationary distributions for policy gradient rein{orcement learning [J], Neural Comput. , 2010, 22 (2): 342 - 376.
  • 6Kim H J, Jordan M I, Sastry S, et al. Autonomous helicopter flight via reinforcement learning C. Advances in neural informa- tion processing systems 2003: 119- 123.
  • 7Caspi Y, Irani M. Feature- based sequence- to- sequence matc- hing [J], International Journal of Computer Vision, 2006, 68 (1) : 53 - 64.
  • 8文天柱,许爱强,汪定国.故障诊断专家系统的可拓知识表示和匹配研究[J].计算机测量与控制,2014,22(6):1670-1672. 被引量:4
  • 9Black M J, Jepson A D. A probabilistic framework for matching temporal trajectories: Condensation-based recognition of gestures and expressions. Computer Vision ECCV'98 [M . Springer Ber- lin Heidelberg, 1998 62-68.
  • 10Rasmussen C E. Gaussian processes for machine learning [M]. London: The MITPress, 2006:107-136.

二级参考文献6

共引文献3

同被引文献6

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部