2Tom M Mitchell.Machine learning[M].Beijing, China:Machine Press,2004:263-280.
3Dayan P.The convergence of TD (λ) for general λ[J].Machine Learning, 1992(8):341-362.
4Kaelbling L P, Littman M L,Moore A W.Reinforcement learning: A survey[J].Joumal of Artificial Intelligence Research, 1996(4): 237-285.
5Watins P Dyna. Q_leaming [J]. Machine Learning, 1992,8 (3): 279-292.
6Moor A W, Atkeson C G.Prioritized sweeping: Reinforcement learning with less data and less real time[J].Machine Learning, 1993,13:103-130.
7Hu J, Wellman M ENash Q-learning for general-sum stochastic games [J]. Journal of Machine Learning Research, 2003 (4): 1039-1069.
8Badtke S J,Barto R G.Linear least-squares algorithms for temporal differenee learning [J]. Machine Learning, 1996,22 (1-3): 33-57.
9Bowling M.Convergence and no-regret in multiagent learning [C].Advances in Naural Information Processing Systems,2004.
10Winfried Ilg,Karsten Bems.A learning architecture based on for adaptive control of the walking machine LAURON[J].Robot and Autonomous System, 1995,15:323-334.