期刊文献+

一种基于信息熵的强化学习算法 被引量:4

Reinforcement learning algorithm based on information entropy
下载PDF
导出
摘要 针对强化学习中探索和利用之间的平衡控制问题,提出了一种基于信息熵的强化学习算法。该算法利用信息熵的概念,定义了一种新的状态重要性测度,度量了状态与目标之间的关联程度,据此设计了一种探索机制,用于自适应调节学习过程中探索和利用之间的平衡;通过设置可变测度阈值的方法,对状态空间进行自主删减,最终生成合适的、规模较小的状态空间,从而大大节约了计算资源,提高了学习速度。仿真结果表明,所提算法具有较好的学习性能。 To control the balance between exploration and exploitation,a reinforcement learning algorithm based on information entropy is proposed.A new state importance measure is defined from information entropy and is applied to measure the interrelatedness between state and objectives.Based on this new measure,an exploration mechanism is designed for adjusting the balance between exploration and exploitation adaptively.In addition,an autonomic reduction method is obtained by setting the variable threshold of measure,the size of state space can gradually reduce to a small and adapt space,which will save computing resource and accelerate learning speed.Simulation results indicate the good learning performance of the presented reinforcement learning algorithm.
出处 《系统工程与电子技术》 EI CSCD 北大核心 2010年第5期1043-1046,共4页 Systems Engineering and Electronics
基金 教育部博士学科点专项科研基金(20070288022) 江苏省自然科学基金(BK2008404) 空间智能控制技术国家级重点实验室资助课题
关键词 强化学习 探索和利用 动作选择 信息熵 reinforcement learning exploration and exploitation action-selection information entropy
  • 相关文献

参考文献11

  • 1Sutton R S,Barto A G.Reinforcement learning:an introduction[M].Cambridge:MIT Press,1998:3-23.
  • 2Ranasinghe N,Shen W M.Surprise-based learning for developmental robotics[C] ∥ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems,2008:65-70.
  • 3沈晶,顾国昌,刘海波.未知动态环境中基于分层强化学习的移动机器人路径规划[J].机器人,2006,28(5):544-547. 被引量:15
  • 4Syaffie S,Tadeo F,Martinez E.Model-free learning control of neutralization processes using reinforcement learning[J].Engineering Applications of Artificial Intelligence,2007,20(6):767-782.
  • 5江琦,奚宏生,殷保群.动态电源管理的随机切换模型与在线优化[J].自动化学报,2007,33(1):66-71. 被引量:7
  • 6Kaelbling L P,Littman M L,Moore A W.Reinforcement learning:a survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.
  • 7王雪松,田西兰,程玉虎,易建强.基于协同最小二乘支持向量机的Q学习[J].自动化学报,2009,35(2):214-219. 被引量:20
  • 8Kaelbling L P.Associative reinforcement learning:a generate and test algorithm[J].Machine Learning,1994,15(3):299-319.
  • 9Whiteson S,Stone P.On-line evolutionary computation for reinforcement learning in stochastic domains[C] ∥Proc.of the Genetic and Evolutionary Computation Conference,2006:1577-1584.
  • 10Robert J M.信息论与编码理论[M].北京:电子工业出版社,2003:13-25.

二级参考文献73

共引文献44

同被引文献30

  • 1孟祥伟,曲东才,何友.高斯背景下距离扩展目标的恒虚警率检测[J].系统工程与电子技术,2005,27(6):1012-1015. 被引量:32
  • 2杜春侠,高云,张文.多智能体系统中具有先验知识的Q学习算法[J].清华大学学报(自然科学版),2005,45(7):981-984. 被引量:21
  • 3周源泉.关于Fiducial方法的研究[J].质量与可靠性,2005(4):20-24. 被引量:11
  • 4王玮,周海云,尹国举.使用混合Beta分布的Bayes方法[J].系统工程理论与实践,2005,25(9):142-144. 被引量:39
  • 5何友,关键,孟祥伟,等.雷达目标检测与恒虚警处理[M].2版.北京:清华大学出版社,2011.
  • 6Wehner D R. High-resolution radar[M]. 2nd ed. London: Ar tech House, 1987.
  • 7Gerlach K, Steiner M J. Adaptive detection of range distributed targets[J]. IEEE Trans. on Signal Processing, 1999, 47 (7) 1844 - 1851.
  • 8Shui P L, Liu H W, Bao Z. Range spread target detection based on cross time-frequency distribution features of two adjacent re- ceived signals[J]. IEEE Trans. on Signal Processing, 2009, 57 (10) : 3733 -3745.
  • 9Shui P L, Xu S W, I.iu H W. Range-spread target detection u sing consecutive HRRPs[J]. IEEE Trans. on Aerospace and E- lectronic Systems, 2011, 47(1) : 647 - 665.
  • 10Bandiera F, Orlando D, Ricci G. CFAR detection strategies for distributed targets under conic eonstraints[J]. IEEE Trans. on Signal Processing, 2009, 57(9): 3305-3316.

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部