期刊文献+

基于行动分值的强化学习与奖赏优化 被引量:1

Action Values Based Reinforcement Learning and Optimized Reward Functions
下载PDF
导出
摘要 针对强化学习算法收敛速度慢、奖赏函数的设计需要改进的问题,提出一种新的强化学习算法.新算法使用行动分值作为智能行为者选择动作的依据.行动分值比传统的状态值具有更高的灵活性,因此更容易针对行动分值设计更加优化的奖赏函数,提高学习的性能.以行动分值为基础,使用了指数函数和对数函数,动态确定奖赏值与折扣系数,加快行为者选择最优动作.从走迷宫的计算机仿真程序可以看出,新算法显著减少了行为者在收敛前尝试中执行的动作次数,提高了收敛速度. A new reinforcement learning algorithm with "action values" as a basis for an agent to choose actions is put forward to improve the design of reward signals. For action values are more flexible than traditional state values, it is easier to design more optimized reward functions and improve learning performance. Based on action values, an exponential function and a logarithmic function are used to compute action rewards and discount rate dynamically, which accelerates agents to choose optimized actions. It shows that through the computer simulation of a maze problem the new algorithm reduces action times before convergence and the convergence speed is thus enhanced.
出处 《同济大学学报(自然科学版)》 EI CAS CSCD 北大核心 2007年第4期531-536,共6页 Journal of Tongji University:Natural Science
基金 国家自然科学基金资助项目(60643001) 教育部新世纪优秀人才计划和上海市曙光计划项目(04SG22)
关键词 强化学习 行动分值 Q算法 奖赏函数 reinforcement learning action values Q algorithm reward functions
  • 相关文献

参考文献8

  • 1Watkins J C,Dayan H P. Q leaming[ J]. Machine Learning, 1992, 8:279.
  • 2李伟,何雪松,叶庆泰,朱昌明.基于先验知识的强化学习系统[J].上海交通大学学报,2004,38(8):1362-1365. 被引量:5
  • 3蒋国飞,高慧琪,吴沧浦.Q学习算法中网格离散化方法的收敛性分析[J].控制理论与应用,1999,16(2):194-198. 被引量:9
  • 4Sutton R S. Open theoretical questions in reinforcement learning [ M] //Fischer P,Simon H U. Proc of the Fourth European Conf on Computational Learning Theory. [ S.l. ] : Springer- Verlag,1999:11 - 17.
  • 5Mataric M J. Reward functions for accelerated learning[C]//Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1994 : 181 - 189.
  • 6Ng Y, Harada D, Russell S. Policy invariance under reward transformations:Theory and application to reward shaping[ M]//Kaufmann M. Proc of the sixteenth intl conf on machine learning. San Francisco: CA, 1999:278 - 287.
  • 7Bonarini A, Bonacina C, Matteucci M. An approach to the design of reinforcement functions in real world, agent-based applications [J]. Man and Cybemeties: Part B,2001,31(3) :288.
  • 8魏英姿 ,赵明扬 .强化学习算法中启发式回报函数的设计及其收敛性分析[J].计算机科学,2005,32(3):190-193. 被引量:13

二级参考文献18

  • 1Minsky M L. Theory of Neural Analog Reinforcement Systems and its Application to the Brain Model Problem. New Jersey,USA, Princeton University, 1954
  • 2Bertsekas D P,Tsitsiklis J N. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996
  • 3Sutton R S, Barto A G. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998
  • 4Boyan J A,Moore A W. Generalization in reinforcement learning:Safely approximating the value function. Advances in Neural Information Processing Systems, 7: 369-376
  • 5Watkins C J C H. Learning from Delayed Rewards: [Ph. D.Thesis]. King's College,University Library Cambridge, 1989
  • 6Sutton R S. Open Theoretical Questions in Reinforcement Learning. In: Proc. of the Fourth European Conf. on Computational Learning Theory. Fischer P, Simon H U, eds. Springer-Verlag,1999.11-17
  • 7Mataric M J. Reward Functions for Accelerated Learning. In:Machine Learning: Proc. of the Eleventh Intl. Conf. Morgan Kaufmann Publishers, San Francisco, CA, 1994. 181- 189
  • 8Ng Y,Hareda D,Russell S. Policy invariance under reward transformations: Theory and application to reward shaping. In:Proc. of the Sixteenth Intl. Conf. on Machine Learning, Morgan Kaufmann,San Francisco,CA, 1999. 278-287
  • 9Bonarini A,Bonaeina C,Matteucci M. An Approach to the Design of Reinforcement Functions in Real World, Agent-Based Applications. Man and Cybernetics, Part B, IEEE Transactions on, 2001,31(3) :288-301
  • 10WEI Ying-zi,ZHAO Ming-yang. Effective Strategies for Complex Skill Real-time Learning Using Reinforcement Learning. In:IEEE Intl. Conf. on Robotics, Intelligent Systems and Signal Processing,2003. 388-392

共引文献21

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部