摘要
为了对二自由度机械臂轨迹进行规划,提出了一种新的动态搜索Q学习算法。该算法不需要建立机械臂的数学模型,直接对轨迹进行规划,根据学习进程动态调整贪婪策略的比例参数,并给出较传统方式更具客观性和公平性的定量策略评价单元。同时,由动态更新机构在线更新学习经验。仿真结果表明,新的Q学习算法能使机械臂更快速地达到目标位置,并实现轨迹全局最优。
In order to achieve the purpose propose an improved Q-learning algorithm of trajectory for 2-DOF (Two Degrees which doesn't need the mathematical of Freedom) manipulator, we model of manipulator and can plan trajectory directly. The algorithm can dynamically adjust parameters of greedy strategy according to the study process. The simulation results show that the manipulator reaches the target position more quickly and the trajec- tory is the most optimal one when the new algorithm is applied to 2-DOF manipulator trajectory plan.
出处
《吉林大学学报(信息科学版)》
CAS
2013年第1期90-94,共5页
Journal of Jilin University(Information Science Edition)
基金
国家青年基金资助项目(61004067)
黑龙江省教育厅科学技术基金资助项目(12511002)
关键词
机械臂
Q学习
贪婪策略
轨迹规划
定量评价单元
manipulator
Q-learning
greedy strategy
trajectory plan
quantitative judgment unit