摘要
针对现有控制算法在倒立摆系统控制中存在的局限性,融合强化学习和深度学习方法,提出一种基于双延迟深度确定性策略梯度(TD3)的倒立摆端到端控制方法。首先,利用倒立摆动力学模型搭建虚拟仿真环境,设计稀疏奖励函数;其次,通过深度神经网络构建从倒立摆状态输入到执行动作输出的端到端控制模型,分析倒立摆特性,来确定神经网络结构和参数;最后,将虚拟仿真环境中生成的模型移植到倒立摆实物平台并进行优化。试验结果表明:该方法生成的模型能够有效地建立倒立摆状态和执行动作之间的映射关系,在运动控制中具有一定的借鉴意义。
Aiming at the limitations of existing control algorithms in the control of inverted pendulum systems, an end-to-end control method for inverted pendulums based on the dual-delay depth deterministic strategy gradient(TD3) is proposed combining reinforcement learning and deep learning. First, the inverted pendulum dynamic model is used to build a virtual simulation environment, and a sparse reward function is designed. Then, a deep neural network is used to build an end-to-end control model from the inverted pendulum state input to the execution action output, the characteristics of the inverted pendulum are analyzed, and the neural network structure and parameters are determined. Finally, the model generated in the virtual simulation environment is transplanted to the inverted pendulum physical platform for optimization. Experiment results show that the model generated by this method can effectively establish the mapping relationship between the state of the inverted pendulum and the execution of the action, which has certain reference significance in motion control.
作者
何卫东
刘小臣
张迎辉
姚世选
HE Weidong;LIU Xiaochen;ZHANG Yinghui;YAO Shixuan(School of Mechanical Engineering,Dalian Jiaotong University,Dalian 116028,China;College of Software,Dalian Foreign Language University,Dalian 116044,China)
出处
《大连交通大学学报》
CAS
2023年第1期38-44,共7页
Journal of Dalian Jiaotong University
关键词
深度强化学习
倒立摆控制
TD3
端到端
稀疏奖励函数
deep reinforcement learning
inverted pendulum control
TD3
end-to-end
sparse reward function