摘要
针对强化学习中探索和利用之间的平衡控制问题,提出了一种基于信息熵的强化学习算法。该算法利用信息熵的概念,定义了一种新的状态重要性测度,度量了状态与目标之间的关联程度,据此设计了一种探索机制,用于自适应调节学习过程中探索和利用之间的平衡;通过设置可变测度阈值的方法,对状态空间进行自主删减,最终生成合适的、规模较小的状态空间,从而大大节约了计算资源,提高了学习速度。仿真结果表明,所提算法具有较好的学习性能。
To control the balance between exploration and exploitation,a reinforcement learning algorithm based on information entropy is proposed.A new state importance measure is defined from information entropy and is applied to measure the interrelatedness between state and objectives.Based on this new measure,an exploration mechanism is designed for adjusting the balance between exploration and exploitation adaptively.In addition,an autonomic reduction method is obtained by setting the variable threshold of measure,the size of state space can gradually reduce to a small and adapt space,which will save computing resource and accelerate learning speed.Simulation results indicate the good learning performance of the presented reinforcement learning algorithm.
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2010年第5期1043-1046,共4页
Systems Engineering and Electronics
基金
教育部博士学科点专项科研基金(20070288022)
江苏省自然科学基金(BK2008404)
空间智能控制技术国家级重点实验室资助课题
关键词
强化学习
探索和利用
动作选择
信息熵
reinforcement learning
exploration and exploitation
action-selection
information entropy