Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive scheme...Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.展开更多
文摘Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.