This paper proposes how to learn and generate multiple action sequences of a humanoid robot. At first, all the basic action sequences, also called primitive behaviors, are learned by a recurrent neural network with pa...This paper proposes how to learn and generate multiple action sequences of a humanoid robot. At first, all the basic action sequences, also called primitive behaviors, are learned by a recurrent neural network with parametric bias (RNNPB) and the value of the internal nodes which are parametric bias (PB) determining the output with different primitive behaviors are obtained. The training of the RNN uses back propagation through time (BPTT) method. After that, to generate the learned behaviors, or a more complex behavior which is the combination of the primitive behaviors, a reinforcement learning algorithm: Q-learning (QL) is adopt to determine which PB value is adaptive for the generation. Finally, using a real humanoid robot, the proposed method was confirmed its effectiveness by the results of experiment.展开更多
目前智能车行进算法多采用传统的PID控制算法方案,其弱点在于响应速度慢、平衡误差较大。提出了基于反向传播神经网络(Backpropagation Through Time,BPTT)的PID精确控制算法,采用以STM32F103C8T6为核心的高精度智能车跷跷板伺服控制系...目前智能车行进算法多采用传统的PID控制算法方案,其弱点在于响应速度慢、平衡误差较大。提出了基于反向传播神经网络(Backpropagation Through Time,BPTT)的PID精确控制算法,采用以STM32F103C8T6为核心的高精度智能车跷跷板伺服控制系统,通过三维陀螺仪加速度计传感器集成模块MPU-6050检测输出信号给主控系统更新策略,使智能车顺利达到平衡位置。多次测试结果表明,智能车在运行中最大行进速度可达3.25 m/s,全程运行平均速度可以达到2.78 m/s,实际测得最大偏差为10.7 mm,该值在允许误差范围之内,说明该神经网络伺服控制系统具有控制精度高、响应速度快、实时性好等优点。展开更多
文摘This paper proposes how to learn and generate multiple action sequences of a humanoid robot. At first, all the basic action sequences, also called primitive behaviors, are learned by a recurrent neural network with parametric bias (RNNPB) and the value of the internal nodes which are parametric bias (PB) determining the output with different primitive behaviors are obtained. The training of the RNN uses back propagation through time (BPTT) method. After that, to generate the learned behaviors, or a more complex behavior which is the combination of the primitive behaviors, a reinforcement learning algorithm: Q-learning (QL) is adopt to determine which PB value is adaptive for the generation. Finally, using a real humanoid robot, the proposed method was confirmed its effectiveness by the results of experiment.