动态环境下数据驱动Q-学习算法

Data-Driven Q-Learning in Dynamic Environment

下载PDF

导出

摘要针对动态环境下强化学习对未知动作的探索和已知最优动作的利用之间难以平衡的问题，提出了一种数据驱动Q-学习算法．该算法首先构建智能体的行为信息系统，通过行为信息系统知识的不确定性建立环境触发机制；依据跟踪环境变化的动态信息，触发机制自适应控制对新环境的探索，使算法对未知动作的探索和已知最优动作的利用达到平衡．用于动态环境下迷宫导航问题的仿真结果表明，该算法达到目标的平均步长比Q-学习算法、模拟退火Q-学习算法和基于探测刷新Q-学习算法缩短了7．79％～84．7％． It is difficult for reinforcement learning to balance between the exploration of untested actions and the exploitation of known optimum actions in dynamic environment. To address this problem, a data-driven Q-learning algorithm was proposed. In this algorithm, the information system of behavior is constructed for each agent. Then the trigger mechanism of environment is build by the uncertainty of knowledge in the information system of behavior to trace the environmental change. The dynamic information of the environment is used to exploit new environment by the trigger mechanism to achieve the balance between the exploration of untested actions and the exploitation of know optimum actions. The proposed algorithm was applied to grid-world navigation tasks. The simulation results show that compared with the Q-learning, simulated annealing Q-learning （SAQ） and recency-based exploration （RBE） Q-learning algorithms, the proposed algorithm has a high learning efficiency.

作者申元霞王国胤

机构地区西南交通大学信息科学与技术学院重庆邮电大学计算机科学与技术研究所重庆文理学院计算机学院

出处《西南交通大学学报》 EI CSCD 北大核心 2009年第6期877-881,共5页 Journal of Southwest Jiaotong University

基金国家自然科学基金资助项目(60573068 60773113) 重庆市自然科学基金资助项目(2008BA2017)

关键词强化学习数据驱动 Q-学习不确定性 reinforcement learning data-driving Q-learning uncertainty

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量：271
2SUTTON R S, BARTO S. Reinforcement learning[M]. Cambridge: MIT Press, 1998.
3ZHANG Kaifu, PAN Wei. The two facets of the exploration-exploitation dilemma[ C]//Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology. Hongkong: IEEE Press, 2006:371-380.
4ZHU S, DANA D H. Overcoming non-stationary in uncommunicative learning[ D]. New Brunswick: Rutgers University, 2002.
5WIERING M A, SCHMIDHUBER J. Efficient model based exploration[C] //Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior. Zurich : MIT Press, 1998 : 223-228.
6KABLBLING I P, LITTMAN M L, MOORE A W. Reinforcement learning: A survey [ J J. J Artificial Intelligence Research, 1996, 4: 237-285.
7PETERS J F, HENRY C. Approximation spaces in off-policy monte carlo learning [ J ]. Engineering Applications of Artificial Intelligence, 2007, 20(5): 667-675.
8VIEN N A, VIET N H, LEE S G. Heuristic search based exploration in reinforcement learning [ C ] //Computational and Ambient Intelligence, 9th International Work-Conference on Artificial Neural Networks. Heidelberg: Springer-Verlag, 4507: 110-118.
9GUO Maozu, LIU Yang, MALEC J. A new Q-learning algorithm based on the metropolis criterion[ J]. IEEE Transactions on Systems, Man, and Cybernetics, 2004, 5 (34): 2140-2143.
10WANG Guoyin. Domain-oriented data-driven data mining (3DM): simulation of human knowledge understanding [ C ]// Web Intelligence Meets Brain Informatics. Heidelberg: Springer-Verlag, 2007, 4845: 278-290.

二级参考文献4

1蒋国飞,吴沧浦.Q学习算法在库存控制中的应用[J].自动化学报,1999,25(2):236-241. 被引量：20
2高阳,周志华,何佳洲,陈世福.基于Markov对策的多Agent强化学习模型及算法研究[J].计算机研究与发展,2000,37(3):257-263. 被引量：30
3李宁,高阳,陆鑫,陈世福.一种基于强化学习的学习Agent[J].计算机研究与发展,2001,38(9):1051-1056. 被引量：26
4杨煜普,欧海涛.基于再励学习与遗传算法的交通信号自组织控制[J].自动化学报,2002,28(4):564-568. 被引量：12

共引文献270

1项宇,秦进,袁琳琳.结合向前状态预测和隐空间约束的强化学习表示算法[J].计算机系统应用,2022,31(11):148-156. 被引量：4
2安萌萌,樊秀梅,蔡含宇.基于雾计算和强化学习的交通灯智能协同控制研究[J].计算机应用研究,2020,37(2):465-469. 被引量：8
3丁志梁,潘毅群(指导),谢建彤,王尉同,黄治钟.强化学习算法在空调系统运行优化中的应用研究[J].建筑节能,2020(7):14-20. 被引量：9
4王彦朋,郭佳佳,王晓君.基于Q-Learning的青霉素发酵过程控制方法[J].信息化研究,2023,49(3):31-35.
5马庆刘,喻鹏,吴佳慧,熊翱,颜拥.基于深度强化学习的综合能源业务通道优化机制[J].北京邮电大学学报,2020,43(2):87-93. 被引量：1
6赵元,张合新.基于目标状态距离简化Q-learning算法的迷宫路径规划[J].火箭军工程大学学报,2019(4):79-84.
7周济,陈锋.基于强化神经网络的区域协调控制研究[J].电子技术（上海）,2010(9):20-22.
8卓睿,陈宗海,陈春林.基于强化学习和模糊逻辑的移动机器人导航[J].计算机仿真,2005,22(8):157-162. 被引量：5
9魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量：19
10沈晶,顾国昌,刘海波.分层强化学习研究综述[J].模式识别与人工智能,2005,18(5):574-581. 被引量：7

1吴斌,衣晓.基于动态簇的无线传感器网络加权质心跟踪算法[J].中国电子科学研究院学报,2015,10(4):355-360. 被引量：2
2陈礼华,熊齐邦.基于探测包的IP网络链路故障监测算法[J].计算机应用,2004,24(B12):61-63. 被引量：1
3殷联甫,汪承焱.基于探测的自适应页面置换算法研究[J].计算机应用与软件,2005,22(6):142-144. 被引量：5
4沙海进,白光伟,沈航,张芃.基于探测的多信道无线网状网机会路由算法[J].计算机科学,2015,42(5):132-135.
5杨夏,毕艳冰,孙延涛.交换式以太网物理拓扑发现的进展与研究[J].计算机应用研究,2007,24(12):24-27. 被引量：4
6冯宗彬,危懿,黄国庆,刘军.网络拓扑发现中别名解析技术研究[J].军事通信技术,2009,30(4):42-47.
7王勇,崔修涛,吕钊,李子成.基于探测对Symmetric NAT与端口受限NAT的穿透方案[J].计算机应用,2006,26(4):922-925. 被引量：5
8刘芳,胡光岷,钱峰.基于探测包群的单播链路丢包率估计[J].电子科技大学学报,2005,34(S1):984-987.
9罗彪,郑金华,朱云飞,蔡自兴.一种基于“探测'与“开采'的多目标进化算法[J].高技术通讯,2010,20(2):143-149.
10毕征,沈苏彬.IP网络接纳控制的研究[J].南京邮电学院学报（自然科学版）,2004,24(3):38-44. 被引量：2

西南交通大学学报

2009年第6期

浏览历史

内容加载中请稍等...

动态环境下数据驱动Q-学习算法

参考文献10

二级参考文献4

共引文献270

相关作者

相关机构

相关主题

浏览历史