This paper presents an extended Dyna-Q algorithm to improve efficiency of the standard Dyna-Q algorithm.In the first episodes of the standard Dyna-Q algorithm,the agent travels blindly to find a goal position.To overc...This paper presents an extended Dyna-Q algorithm to improve efficiency of the standard Dyna-Q algorithm.In the first episodes of the standard Dyna-Q algorithm,the agent travels blindly to find a goal position.To overcome this weakness,our approach is to use a maximum likelihood model of all state-action pairs to choose actions and update Q-values in the first few episodes.Our algorithm is compared with one-step Q-learning algorithm and the standard Dyna-Q algorithm for the path planning problem in maze environments.Experimental results show that the proposed algorithm is more efficient than the one-step Q-learning algorithm as well as the standard Dyna-Q algorithm,especially in the large environment of states.展开更多
基金supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education,Science and Technology(2010-0012609)
文摘This paper presents an extended Dyna-Q algorithm to improve efficiency of the standard Dyna-Q algorithm.In the first episodes of the standard Dyna-Q algorithm,the agent travels blindly to find a goal position.To overcome this weakness,our approach is to use a maximum likelihood model of all state-action pairs to choose actions and update Q-values in the first few episodes.Our algorithm is compared with one-step Q-learning algorithm and the standard Dyna-Q algorithm for the path planning problem in maze environments.Experimental results show that the proposed algorithm is more efficient than the one-step Q-learning algorithm as well as the standard Dyna-Q algorithm,especially in the large environment of states.