摘要
针对双足机器人行走过程中的步态稳定控制问题,提出一种改进深度Q网络的深度强化学习方法。首先,将深度Q网络算法与确定性策略梯度相结合,提出用修正Double-Q网络优化操作—评论网络的评论网络,给出一种改进的深度Q网络;然后,建立双足机器人连杆模型,在常规的平整路面上将改进的深度Q网络用于作为智能体的双足机器人进行步态控制训练。MATLAB仿真结果表明,与深度Q网络和深度确定性策略梯度算法相比,所提算法有更好的训练速度且其回报曲线具有良好的平滑性。在CPU训练下,经过20 h左右深度强化学习能够完成智能体训练。双足机器人在较小的力矩和长距离下能够稳定快步行走。
Aiming at the stable control of gait during biped robot walking,a deep reinforcement learning method with improved Deep Q-Network(DQN)was proposed.By combining DQN algorithm with a deterministic strategy gradient,an improved DQN learning network was proposed to replace the critic network of actor-critic network with a clipped Double-Q network.A link model of biped robot was established,and the proposed network was used for biped robots gait control training as agents in a conventional flat road environment.MATLAB simulation results showed that compared with DQN and Deep Deterministic Policy Gradient(DDPG)algorithms,the proposed algorithm had a better training speed and its average reward curve had a good smoothness.Under the CPU training conditions,the agent training could be completed after about 20 hours of deep reinforcement learning.The biped robot could achieve stable and fast walking(average speed about 0.5m/s)under the conditions of small torque and long distance(about 5 meters).
作者
冯春
张祎伟
黄成
姜文彪
武之炜
FENG Chun;ZHANG Yiwei;HUANG Cheng;JIANG Wenbiao;WU Zhiwei(School of Aerospace and Mechanical Engineering,Changzhou Institute of Technology,Changzhou 213032,China)
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2021年第8期2341-2349,共9页
Computer Integrated Manufacturing Systems
基金
国家自然科学基金青年基金资助项目(11802040)
2018年江苏省青蓝工程优秀青年骨干教师资助项目(A1-5501-19-003)。