摘要
经典Q-learning强化学习模型中学习率为一固定参数,无法有效反映认知学习的动态过程。提出了一种将学习速率表征为时变参数的Q-Learning强化学习模型,给出了利用近期历史行为数据估计阶段性学习速率的方法。为了评估验证该模型的性能,设计了条件刺激与操作行为奖励无关→相关→无关三个阶段动态试验范式,用以观察和分析鸽子在随机强化、固定强化,以及固定强化关系消退等不同条件下的学习行为变化过程,采用动物触屏行为系统完成了3只鸽子颜色刺激-啄屏抉择认知训练,利用训练过程中不同session的行为数据对动态学习率进行了最小二乘估计。分析结果表明:可以获得更小的行为预测误差,误差下降收敛的速度更快,同时学习率的动态变化过程可以有效的反映动物认知行为训练过程中的内在学习状态。
The learning rate in classic Q-learning model is a fixed parameter,which can't reflect the dynamic learning process of agent. So a new Q-Learning model was proposed in which the learning rate is time-varying. To evaluate and verify the performance of this new model,firstly,a three-phase paradigm was designed,in which the relationship between conditioned stimulus and operant behavior varied from unrelated to related and eventually became unrelated. Next,a touch-screen behavioral system of animal was employed to complete decision-making cognitive training of three pigeons. The data from different sessions in the process of training was used to estimate the phased optimal learning rate by means of least squares estimation. The results indicated that Q-learning model of Dynamic learning rate can obtain smaller behavior prediction error,and dynamic process of learning rate can effectively reflect the inherent learning state in the animal cognitive behavioral training process.
出处
《科学技术与工程》
北大核心
2017年第13期120-125,共6页
Science Technology and Engineering