2Sutton R S, Barto A G. Introduction to Reinforcement Learning [M]. Cambridge: MIT Press, 1998.
3Liu C, Xu X, Hu D. Multiobjeetive reinforcement learning: A comprehensive overview [J]. IEEE Trans on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2013, 99(4): 1-13.
4Sutton R S, Precup D, Singh S P. Between MDPs and semi MDPs : A framework for temporal abstraction in reinforcement learning [J]. Artificial Intelligence, 1999, 112 (1) : 181-211.
5Parr R. Hierachieal control and learning for markov decision processes [D]. Berkeley: University of Californiac at Berkeley, 1998.
6Hengst B. Discovering hierarchical reinforcement learning [D]. Sydney: University of New South Wales, 2003.
7Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition [J]. Journal of Artificial Intelligence Research, 2000, 13(1): 227-303.
8Hwang K S, Lin H Y, Hsu Y P, et al. Self-organizing state aggregation for architecture design of Q-learning [J]. Information Sciences, 2011, 181(13) : 2813-2822.
9Ng A Y, Harada D, Russell S. Policy invariance under reward transformations: theory and application to reward shaping [C] //Proc of the 16th Int Conf on Machine Learning. San Francisco: Morgan Kaufmann, 1999= 278-287.
10Bianchi R A C, Ribeiro C H C, Costa A H R. Accelerating autonomous learning by using heuristic selection of actions [J]. Journal of Heuristics, 2008, 14(2): 135-168.