摘要
在模型未知和没有先验经验的条件下,采用一种改进的强化学习算法实现二级倒立摆系统的平衡控制。该学习算法不需要预测和辨识模型,能通过网络自身的联想和记忆,在线寻求最优策略。该学习算法采用基于神经网络的值函数逼近,并用直接梯度和适合度轨迹修正权值,有效实现对连续状态和行为空间任务的控制。计算机仿真证明了该强化学习算法在较短的时间内即可成功地学会控制直线二级倒立摆系统。
An improved reinforcement learning system is proposed to control the inverted pendulum, when the model of the inverted pendulum is not available and the agent has no a priori control knowledge. The learning system does not require prediction model and identification model, and can explore the optimal decision - making on- line by its association and memory. And it adopts neural network, and uses gradient and eligibility traces to update the weights of the networks. It can effectively control the task of continuous states and actions. The simulation results demonstrate that it can learn to control the inverted pendulum system in a short time.
出处
《计算机仿真》
CSCD
2006年第4期305-308,共4页
Computer Simulation
基金
国家自然科学基金资助课题(60375017)
关键词
强化学习
倒立摆
适合度轨迹
Reinforcement learning
Inverted pendulum
Eligibility traces