2Silver D, Sutton R, Mtller M. Temporal-difference search in computer Go [J]. Machine Learning, 2012, 87.. 183 219.
3Wang F Y, Jin N, Liu D R, et al. Adaptive dynamic programming for finite horizon optimal control of discrete time nonlinear systems with e-error bound [J]. IEEE Transactions on Neural Networks, 2011, 22 (1): 24-36.
4Hafner R, Riedmiller M. Reinforcement learning in feedback control: challenges and benchmarks from technical process control[J]. Machine Learning, 2011, 84: 137-169.
5Choi J, Klm K E. Inverse reinforcement learning in partially observable environments[J]. Journal of Machine Learning Research, 2011, 12: 691-730.
6Meltzoff, A N, Kuhl, P K, Movellan J, et al. Founda- tions for a new science of learning[J]. Science, 2009, 325: 284-288.
7Kovacs T, Egginton R. On the analysis and design of software for reinforcement learning with a survey of existing systems [J]. Machine Learning, 2011, 84: 7 -49.
8Doshi-Velez F, Pineau J, Roy N. Reinforcementlearning with limited reinforcement: Using Bayes risk for active learning in POMDPs [J]. Artificial Intelligence, 2012, 1870 188: 115-132.
9Frommberger L, Wolter D. Structural knowledge transfer by spatial abstraction for reinforcement learning agents[J]. Adaptive Behavior, 2010, 18 (6): 531-539.
10Kozlova O. Hierarchical & Factored reinforcement lea- rning[D]. Paris: Universit6 Pierre et Marie Curie, 2010.