改进的模糊Sarsa学习

An Improved Fuzzy Sarsa Learning

导出

摘要为了解决模糊Sarsa学习(FSL)无法在线自适应调节学习因子和不能处理学习过程中探索与利用的平衡问题,提出了一种改进的模糊Sarsa学习(IFSL)算法.在FSL基础上,引入自适应学习率产生器来在线调节学习因子,增加模糊平衡器控制探索和利用的程度.给出了IFSL的结构框图,证明了IFSL中可调节权向量具有平衡不动点.仿真结果表明,与FSL相比,IFSL能加快系统的学习收敛速度,具有较好的学习性能. It is difficult for fuzzy Sarsa learning（FSL） to tune learning rate and balance exploration vs. exploitation, so an improved FSL（IFSL） method based on FSL is presented. In the method, an adaptive learning rate generator for tuning learning rate on-line and a fuzzy balaneer for controlling the degree of exploration vs. exploitation are introduced. The diagram of IFSL is given, and the weight vector of IFSL with stationary action selection policy converges to a unique value is proved. Simulation results show that IFSL well manager balance, and outperforms FSL in terms of learning speed and action quality.

作者陈学松杨宜民

机构地区广东工业大学应用数学学院广东工业大学自动化学院

出处《北京邮电大学学报》 EI CAS CSCD 北大核心 2011年第2期31-34,44,共5页 Journal of Beijing University of Posts and Telecommunications

基金国家自然科学基金项目(60974019) 广东省自然科学基金项目(9451009001002686)

关键词强化学习模糊控制模糊Sarsa学习探索利用 reinforcement learning fuzzy control fuzzy Sarsa learning exploration exploitation

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献6

1Gosavi A. Reinforcement learning: a tutorial survey and recent advances [J]. INFORMS Journal on Computing, 2009, 21(2) : 178-192.
2Derhami V, Majd V, Ahmadabadi M N. Exploration and exploitation balance management in fuzzy reinforcement learning[J]. Fuzzy Sets and Systems, 2010, 161 (4):578-595.
3Alba E, Dorronsoro B. The exploration/exploitation tradeoff in dynamic cellar genetic algorithms [ J]. IEEE Transactions on Evolutionary Computation, 2005, 9 ( 2 ) : 126-143.
4Tan K C, Chiam S C, Mamun A A. Balancing exploration and exploitation with adaptive variation for evolutionary muhi-objective optimization [ J ]. European Journal of Operational Research, 2009, 197(2) : 701-713.
5Vali D, Vahid J M, Majid N A. Fuzzy sarsa learning and the proof of existence of its stationary points [ J ]. Asian Journal of Control, 2008, 10 (5) : 535-549.
6Juang C F, Hsu C H. Reinforcement interval type-2 fuzzy controller design by online rule generation and Q-value- aided ant colony optimization [ J ]. IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, 2009, 39(6) : 1528-1542.

1谢光强,陈学松.一种新的基于蚁群优化的模糊强化学习算法[J].计算机应用研究,2011,28(4):1266-1268. 被引量：2
2李坤,黎明,陈昊.基于探索与利用平衡理论的灾变粒子群算法[J].模式识别与人工智能,2015,28(7):603-612. 被引量：1
3王庆福.谈数据仓库与数据挖掘教学研究[J].中国科教创新导刊,2012(28):179-179. 被引量：1
4伊宏鑫.接口选择马虎不得[J].网管员世界,2005(4):55-55.
5魏振春,徐祥伟,冯琳,丁蓓.基于Q学习和规划的传感器节点任务调度算法[J].模式识别与人工智能,2016,29(11):1028-1036. 被引量：5
6赵昀,陈庆伟,胡维礼.一种基于信息熵的强化学习算法[J].系统工程与电子技术,2010,32(5):1043-1046. 被引量：4
7罗青,李智军,吕恬生.复杂环境中的多智能体强化学习[J].上海交通大学学报,2002,36(3):302-305. 被引量：9
8迈克菲内容安全刀片服务器[J].中国计算机用户,2008(16):71-71.
9森精机制作所与Lord公司共同开发自适应平衡器技术[J].汽车制造业,2007(23):13-13.
10森精机与Lord共同开发自适应平衡器技术[J].现代制造,2007(B12):25-25.

北京邮电大学学报

2011年第2期

浏览历史

内容加载中请稍等...

改进的模糊Sarsa学习

参考文献6

相关作者

相关机构

相关主题

浏览历史