动态约束下可重构模块机器人分散强化学习最优控制被引量：5

Decentralized reinforcement learning optimal control for time varying constrained reconfigurable modular robot

下载PDF

导出

摘要基于ction-critic-identifier(ACI)与RBF神经网络,提出了一种外界动态约束下的可重构模块机器人分散强化学习最优控制方法,解决了存在强耦合不确定性的模块机器人系统的连续时间非线性最优控制问题。文中将机器人动力学模型描述为一个交联子系统的集合,基于连续时间MDPs性能指标,结合ACI与RBF神经网络,对子系统最优值函数,最优控制策略及总体不确定项进行辨识,使系统满足HJB方程下的最优条件,从而使可重构模块机器人子系统渐进跟踪期望轨迹,跟踪误差收敛且有界。采用Lyapunov理论对系统稳定性进行证明,数值仿真验证了所提出的分散控制策略的有效性。 Based on Action-Critic-Identifier （ACI） and Radial Basis Function （RBF） neural network, a novel decentralized reinforcement learning optimal control method for time varying constrained reconfigurable modular robot is presented. The continuous time nonlinear optimal control problem of strongly coupled uncertainty robotic system is solved. The dynamics of the robot is described as a synthesis of interconnected subsystems. As a precondition to the continuous-time MDPs performance indicators, the optimal value function, optimal control policy and global uncertainty of the subsystems are estimated combing with ACI and RBF network. The optimal conditions of HJB equation with regard to the subsystem are satisfied, so that the reconfigurable modular robot system can track the desired trajectory in a short time and the estimation error can converge to zero in finite time. The stability of the system is confirmed by Lyapunov theory. Simulations are performed to illustrate the effectiveness of the proposed decentralized control scheme.

作者董博刘克平李元春

机构地区吉林大学控制科学与工程系长春工业大学控制工程系

出处《吉林大学学报（工学版）》 EI CAS CSCD 北大核心 2014年第5期1375-1384,共10页 Journal of Jilin University:Engineering and Technology Edition

基金国家自然科学基金项目(61374051 60974010) 吉林省科技发展计划项目(20110705)

关键词自动控制技术可重构模块机器人强化学习非线性最优控制分散控制 automatic control technology reconfigurable modular robot reinforcement learning nonlinear optimal control decentralized control

分类号 TP273 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献13

1Li Yuan-chun, Dong Bo. Decentralized ADRC con- trol for reconfigurable manipulators based on VG STA-ESO of sliding mode[J]. Information-an Inter- national Interdisciplinary Journal, 2012, 15 (6) : 2453- 2465.
2李英,朱明超,李元春.基于速度观测模型的可重构机械臂补偿控制[J].控制理论与应用,2008,25(5):891-897. 被引量：4
3朱明超,李元春.可重构机械臂分散自适应模糊滑模控制[J].吉林大学学报（工学版）,2009,39(1):170-176. 被引量：3
4朱明超,李英,李元春,姜日花.基于观测器的可重构机械臂分散自适应模糊控制[J].控制与决策,2009,24(3):429-434. 被引量：6
5Xu Yan-kai, Cao Xi ren. Lebesgue-sampling-based optimal control problems with time aggregation[J]. IEEE Transactions on Automatic Control, 2011, 56 (5): 1097-1109.
6Lewis F L, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control [J]. IEEE Circuits and Systems Magzine, 2009, 9 (3) : 32-50.
7Xu Xin, He Han gen, Hu De-wen. Efficient rein forcement learning using recursive least-squares methods[J]. Journal of Artificial Intelligence Re- search, 2002, 16: 259-292.
8Lewis F L, Liu De-rong. Reinforcement I.earning and Approximate Dynamic Programming for Feed back Control[M}. New York= Wiley IEEE Press, 2012.
9Lewis F L, Syrmos V L. Optimal Control[M]. New York: John Wiley & Sons, Ine, 1995.
10Sassano M, Astolfi A. Dynamic approximate solu tions of the HJ inequality and of the HJB equation for input-affine nonlinear systems[J]. IEEE Trans- actions on Automatic Control, 2012, 57(10) :2490- 2503.

二级参考文献48

1刘英卓,王越超,席宁.类Lyapunov理论在类人形机器人任务空间内跟踪的应用[J].控制理论与应用,2004,21(3):351-356. 被引量：1
2李英,朱明超,李元春.可重构机械臂模糊神经补偿控制[J].吉林大学学报（工学版）,2007,37(1):206-211. 被引量：6
3彭济根,倪元华,乔红.柔性关节机操手的神经网络控制[J].自动化学报,2007,33(2):175-180. 被引量：23
4丰保民,马广程,温奇咏,王常虹.任务空间内空间机器人鲁棒智能控制器设计[J].宇航学报,2007,28(4):914-919. 被引量：25
5Paredis C J J, Brown H B, Khosla P K. A rapidly deployable manipulator system [ J ]. Robotics and Autonomous Systems, 1997, 21: 289-304.
6Melek W W, Goldenberg A A. Neurofuzzy control of modular and reconfigurable robots [J]. IEEE/ASME Trans on Mechatronics, 2003, 8(3): 381-389.
7Kirchoff S, Melek W W. A saturation-type robust controller for modular manipulators arms [ J ]. Mechatronics, 2007, 17: 175-190.
8Liu G J, Abdul S, Goldenberg A A. Distributed control of modular and reconfigurable robot with torque sensing [J]. Robotica, 2008, 26; 75 84.
9Hsu S. Fu L. A fully adaptive decentralized control of robot manipulators[J]. Automatica, 2006, 42: 1761- 1767.
10Tang Y, Tomizuka M, Guerrero G, et at. Decentralized robust control of mechanical systems[J]. IEEE Trans on Automatic Control, 2000. 45(4): 771-775.

共引文献40

1周国成,刘士荣.基于RBF神经网络的高精度伺服系统补偿控制[J].杭州电子科技大学学报（自然科学版）,2009,29(6):91-94. 被引量：1
2朱明超,李元春,姜日花.可重构模块机器人分散容错控制[J].控制与决策,2009,24(8):1247-1251. 被引量：2
3凌睿,柴毅.悬臂式掘进机器人截割臂建模与二阶滑模控制器设计[J].控制理论与应用,2010,27(8):1037-1046. 被引量：8
4李元春,朱路,董博,刘克平.可重构机械臂分散自适应迭代学习控制[J].吉林大学学报（工学版）,2012,42(2):469-475. 被引量：2
5吴勇,杜艳丽,张炜.基于扩张状态观测器的机械臂分散自适应模糊控制[J].东南大学学报（自然科学版）,2012,42(A01):192-195. 被引量：4
6吴玉香,王聪.不确定机器人的自适应神经网络控制与学习[J].控制理论与应用,2013,30(8):990-997. 被引量：20
7刘桂林,高道祥,闫磊.基于T-S模糊模型的机器人轨迹跟踪控制[J].现代电子技术,2014,37(8):102-104.
8吴海霞.不确定离散时滞中立神经网络鲁棒稳定性分析[J].计算机光盘软件与应用,2014,17(11):75-78.
9殷礼胜,何怡刚,董学平,鲁照权.交通流量VNNTF神经网络模型多步预测研究[J].自动化学报,2014,40(9):2066-2072. 被引量：13
10李宁洲,冯晓云,卫晓娟.动态多子群QPSO算法及其在机车粘着优化控制中的应用[J].计算机应用研究,2014,31(10):3020-3023. 被引量：2

同被引文献43

1刘富,安毅,董博,李元春.基于ADP的可重构机械臂能耗保代价分散最优控制[J].吉林大学学报（工学版）,2020,50(1):342-350. 被引量：5
2杨轶霞.基于HMI和PLC的多功能交通信号灯自动控制系统[J].自动化与仪器仪表,2016(1):15-16. 被引量：3
3Hashimoto M, Kiyosawa T, Paul R P. A torque sensing technique for robots with harmonic drives[J]. IEEE Trans on Robotics and Automation, 1993, 9(1): 108-116.
4Tuttle T D, Seering W P. A nonlinear model of a harmonic drive gear transmission[J]. IEEE Trans on Robotics and Automation, 1996, 12(3): 368-374.
5Kennedy C W, Desai J P. Modeling and control of the Mitsubishi pa-10 robot arm harmonic drive system[J]. IEEE/ASME Trans on Mechatronics, 2005, 10(3): 263-274.
6Zhang H, Ahmad S, Liu G J. Modeling of torsional compliance and hysteresis behaviors in harmonic drives[J]. IEEE/ASME Trans on Mechatronics, 2015, 20(1): 178-184.
7Curt P, Thomas R J, Deming S. A high-fidelity harmonic drive model[J]. ASME J of Dynamic Systems, Measurement, and Control, 2012, 134(1): 457-461.
8Albu-Schaffer A, Ott C, Hirzinger G. A unified passivity-based control framework for position, torque, and impedance control of flexible joint robots[J]. The Int J of Robotics Research, 2007, 26(1): 23-39.
9Liu G, Abdul S, Goldenberg A A. Distributed control of modular and reconfigurable robot with torque sensing[J]. Robotica, 2008, 26(1): 75-84.
10Liu G, Liu Y, Goldenberg A A. Design, analysis, and control of a spring-assisted modular and reconfigurable robot[J]. IEEE/ASME Trans on Mechatronics, 2011, 16(4): 695-706.

引证文献5

1董博,刘克平,李元春.受动态约束的谐波传动式可重构模块机器人分散积分滑模控制[J].控制与决策,2016,31(3):441-447. 被引量：13
2陈筱,张琰.飞行轨迹偏离误差反馈控制数学模型仿真[J].微电子学与计算机,2017,34(6):104-108. 被引量：3
3伍玩秋.自动化控制下机械手臂运动轨迹研究[J].计算机测量与控制,2017,25(6):78-81. 被引量：8
4鲜斌,张诗婧,韩晓薇,蔡佳明,王岭.基于强化学习的无人机吊挂负载系统轨迹规划[J].吉林大学学报（工学版）,2021,51(6):2259-2267. 被引量：7
5董博,王悦西,安天骄,刘富,李元春.面向人机物理交互的谐波传动式模块化机器人系统分散积分滑模控制[J].长春工业大学学报,2022,43(4):392-403. 被引量：3

二级引证文献34

1刘富,安毅,董博,李元春.基于ADP的可重构机械臂能耗保代价分散最优控制[J].吉林大学学报（工学版）,2020,50(1):342-350. 被引量：5
2陆兴华,甄汉健,段五星.嵌入式多模控制系统的容错性控制器设计[J].机械与电子,2016,34(4):62-65. 被引量：10
3陆兴华.姿态融合滤波的无人机抗干扰控制算法[J].传感器与微系统,2016,35(7):116-119. 被引量：25
4陈朝俊.基于小型PLC的数据监控与传输系统优化设计与实现[J].现代电子技术,2017,40(2):163-166. 被引量：6
5李新,汪应,周桐.基于嵌入式技术的机器人激光测距控制器设计[J].激光杂志,2017,38(5):62-66. 被引量：2
6伍玩秋.自动化控制下机械手臂运动轨迹研究[J].计算机测量与控制,2017,25(6):78-81. 被引量：8
7齐新凤.重症监护病区辅助呼吸机自动调压技术研究[J].自动化与仪器仪表,2018,0(5):215-218. 被引量：3
8李永,朱松青,高海涛,周英路.模块化机器人神经网络补偿计算力矩控制研究[J].制造业自动化,2018,40(6):36-39. 被引量：2
9张婷婷.机械臂刚体运动防冲突角度最优输出方法[J].科学技术与工程,2018,18(15):249-253. 被引量：2
10谷润平,田琳琳,魏志强.民用飞机RNP导航能力的适航评估模型[J].中国科技论文,2018,13(19):2242-2246. 被引量：4

1郭超,梁晓庚,王斐.基于ADP的高超声速飞行器非线性最优控制[J].火力与指挥控制,2014,39(6):77-81. 被引量：3
2黄元君,林小峰,王道宏.带饱和执行器非线性时滞系统的自适应动态规划[J].中南大学学报（自然科学版）,2013,44(5):1881-1887. 被引量：1
3林小峰,黄元君,宋春宁.基于ADP算法的带时滞及饱和的非线性系统最优控制[J].信息与控制,2012,41(2):185-192. 被引量：2
4王耀南.基于神经网络的非线性最优控制[J].湖南大学学报（自然科学版）,1995,22(5):68-74. 被引量：2
5李俊民,邢科义,万百五.基于双线性模型的连续时间非线性最优控制的DISOPE算法[J].控制与决策,2000,15(4):461-464. 被引量：3
6陈丽.基于PD+反馈结构的机器人鲁棒自适应分散控制[J].邢台职业技术学院学报,2003,20(5):54-57.
7刘妹琴,廖晓昕,陈际达,李湘林.用进化RBF神经网络控制二级倒立摆[J].控制理论与应用,2000,17(4):593-596. 被引量：13
8丁雪峰,许贤良.液压伺服系统的非线性最优控制[J].液压与气动,2004,28(2):32-35. 被引量：10
9孙勇,张卯瑞,梁晓玲.求解含复杂约束非线性最优控制问题的改进Gauss伪谱法[J].自动化学报,2013,39(5):672-678. 被引量：18
10赵楠,蒋程,詹厚剑.基于自适应评价UPFC神经网络控制器设计[J].微计算机信息,2010,26(13):69-71. 被引量：1

吉林大学学报（工学版）

2014年第5期

浏览历史

内容加载中请稍等...

动态约束下可重构模块机器人分散强化学习最优控制被引量：5

参考文献13

二级参考文献48

共引文献40

同被引文献43

引证文献5

二级引证文献34

相关作者

相关机构

相关主题

浏览历史

动态约束下可重构模块机器人分散强化学习最优控制 被引量：5

参考文献13

二级参考文献48

共引文献40

同被引文献43

引证文献5

二级引证文献34

相关作者

相关机构

相关主题

浏览历史

动态约束下可重构模块机器人分散强化学习最优控制被引量：5