摘要
标准的Sarsa算法对状态空间的要求是离散的且空间较小,而实际问题中很多的系统的状态空间是连续的或尽管是离散的但空间较大,这就要求有很大的空间来存储状态动作对(State-Action-Pair)。对此文中提出用BP网络队列保存SAPs,实验验证可以解决由于空间过大而带来的Q值表示问题。
The standard Sarsa algorithm requires that the state space is discrete and small. However, in real environment it does not satisfy that due to the fact that it may be continuous or discrete but has big space state, so it needs too memory to keep State - Action - pair (SAPs). This paper proposes to use BP queue to store SAPs. The experiment shows it can resolve the problem that how to represent Q values in case of big state space.
出处
《计算机技术与发展》
2006年第1期30-32,共3页
Computer Technology and Development
关键词
强化学习
智能主体
马尔可夫决策过程
误差后向传播网络
状态动作对
reinforcement learning
agent
MDP ( Markov decision process )
BP ( back propagation )
SAP ( state - action - pair)