A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation 被引量：1

下载PDF

导出

摘要 In any classical value-based reinforcement learning method,an agent,despite of its continuous interactions with the environment,is yet unable to quickly generate a complete and independent description of the entire environment,leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks,namely exploration and exploitation.This problem becomes more pronounced when the agent has to deal with a dynamic environment,of which the configuration and/or parameters are constantly changing.In this paper,this problem is approached by first mapping a reinforcement learning scheme to a directed graph,and the set that contains all the states already explored shall continue to be exploited in the context of such a graph.We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process,and thus,there is no need to face the exploration vs.exploitation tradeoff as all the existing reinforcement learning methods do.Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment,which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper.The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes,rendering it an algorithm of choice in applications involving dynamic environments.

作者 Han Li Tianding Chen Hualiang Teng Yingtao Jiang

机构地区 College of Mathematics School of Physics and Information Engineering Department of Civil and Environmental Engineering Department of Electrical and Computer Engineering

出处《Computer Modeling in Engineering & Sciences》 SCIE EI 2019年第2期253-274,共22页 工程与科学中的计算机建模（英文）

基金 This research work is supported by Fujian Province Nature Science Foundation under Grant No.2018J01553.

关键词 REINFORCEMENT learning GRAPH EXPLORATION and EXPLOITATION maze.

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

引证文献1

1贾思雨,毕凌滔,曹扬,吕乃冰.基于改进MADDPG的多机器人路径规划方法研究[J].计算机仿真,2024,41(8):458-465.

1本刊讯.DeepMind开源三个深度强化学习框架[J].数据分析与知识发现,2019,3(9):76-76.
2Lei Yang,Yulong Liu,Hanquan Zhang,Bo Xiao,Xianwei Guo,Rupeng Wei,Lei Xu,Lingjie Sun,Bin Yu,Shudong Leng,Yanghui Li.The status of exploitation techniques of natural gas hydrate[J].Chinese Journal of Chemical Engineering,2019,27(9):2133-2147. 被引量：10
3NEW DISCOVERY[J].Beijing Review,2019,62(49):3-3.
4Xinxing LI,Zhihong PENG,Lei JIAO,Lele XI,Junqi CAI.Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games[J].Science China(Information Sciences),2019,62(12):148-161. 被引量：7
5Kazuyuki Fujii.Least Squares Method from the View Point of Deep Learning II: Generalization[J].Advances in Pure Mathematics,2018,8(9):782-791. 被引量：1
6韩红桂,郐晓丹,张璐,乔俊飞.基于模糊神经网络的废旧手机价值评估方法[J].北京工业大学学报,2019,45(11):1033-1040. 被引量：8
7邹明.从暖新闻到善传播——凤凰网的建设性新闻实践[J].新闻与传播研究,2019,26(S01):76-80. 被引量：11
8M.Zaouche,A.Beloula,R.Louali,S.Bouaziz,M.Hamerlain.Adaptive Differentiators via Second Order Sliding Mode for a Fixed Wing Aircraft[J].Computer Modeling in Engineering & Sciences,2015(3):159-184.
9Usman Ahmad,Michael Ruschel,Frank C. Detterbeck.Lung Cancer: Facts, Figures and Reflections on Spending[J].Journal of Cancer Therapy,2012,3(2):123-126. 被引量：3
10Chunlong Hu,Jianjun Chen,Xin Zuo,Haitao Zou,Xing Deng,Yucheng Shu.Gender-Specific Multi-Task Micro-Expression Recognition Using Pyramid CGBP-TOP Feature[J].Computer Modeling in Engineering & Sciences,2019(3):547-559.

Computer Modeling in Engineering & Sciences

2019年第2期

浏览历史

内容加载中请稍等...

A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation 被引量：1

引证文献1

相关作者

相关机构

相关主题

浏览历史