基于Tile Coding编码和模型学习的Actor-Critic算法被引量：3

Actor-Critic Algorithm Based on Tile Coding and Model Learning

下载PDF

导出

摘要 Actor-Critic是一类具有较好性能及收敛保证的强化学习方法,然而,Agent在学习和改进策略的过程中并没有对环境的动态性进行学习,导致Actor-Critic方法的性能受到一定限制。此外,Actor-Critic方法中需要近似地表示策略以及值函数,其中状态和动作的编码方法以及参数对Actor-Critic方法有重要的影响。Tile Coding编码具有简单易用、计算时间复杂度较低等优点,因此,将Tile Coding编码与基于模型的Actor-Critic方法结合,并将所得算法应用于强化学习仿真实验。实验结果表明,所得算法具有较好的性能。 The Actor-Critic（AC） approach is a class of reinforcement learning method which has good performance and ensures convergence,but the Agent does not study the dynamic of environment in the process of learning and improving policy,which causes the performance of the AC method to be restricted to a certain extent.In addition,the AC method needs to represent the policy and value function approximately,and the encoding methods of state and action and parameters have important influence on AC method.Tile Coding has advantages of simple and low computing time complexity,so we combined the Tile Coding with Actor-Critic method based on model and applied the algorithm to the simulation experiment on reinforcement learning,and the results show that the algorithm has good performance.

作者金玉净朱文文伏玉琛刘全

机构地区苏州大学计算机科学与技术学院

出处《计算机科学》 CSCD 北大核心 2014年第6期239-242,249,共5页 Computer Science

基金国家自然科学基金(61070122 61373094 61070223 61103045) 江苏省自然科学基金(BK2009116) 江苏省高校自然科学研究项目(09KJA520002)资助

关键词强化学习 TILE CODING Actor-Critic 模型学习函数逼近 Reinforcement learning Tile Coding Actor-Critic Model learning Function approximation

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献16

1Sutton R S,Barto A G.Reinforcement Learning:An Introduction[M].MIT Press,1998.
2Busoniu L,Babuska R,DeSchutter B,et al.Reimforcement Leaming and Dynamic Programming Using Function Approximators[M].Boca Raton,FL:CRC Press,2010.
3Grondman I,Busoniu L,et al.A Survey of Actor-Critic Reinforcement Learning:Standard and Natural Policy Gradients[J].IEEE Transactions on Systems,Man,and Cybernetics—Part C:Applications and Reviews,2012,42(6):1291-1307.
4Barto A G,Sutton R S,Anderson C W.Neuronlike Adaptive Element That Can Solve Difficult Learning Control Problems[J].IEEE Trans Syst Man Cybem,1983,13:834-846.
5Konda V R,Tsitsiklis J N.Actor-Critic Algorithms[C]// Proceedings of Advances in Neural Information Processing Systems.2000.
6Rosenstein M T,Barto A G.Supervised Learning Combined with an Actor-Critic Architecture[J].CMPSCI Technical Report 02-41.October 2002.
7Peters J,Schaal S.Natural actor-critic[J].Neurocomputing,2008,71(7-9):1180-1190.
8Bathnagar S,Sutton R S,Ghavamzadeh M,et al.Natural actor critic algorithms[J].Automatica,2009,45 (11):2471-2482.
9Vamvoudakis K G,Lewis F L.Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem[J].Automatica,2010,46(5):878-888.
10Grondman I,Vaandrager M,Busoniu L,et al.Efficient Model Learning Methods for Actor-Critic Control[J].IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics,2012,42(3):591-602.

同被引文献26

1Li H G,Li Z,Robert T,et al.A real-time transportation prediction system[J].Applied Intelligence,2013,39(4):793-804.
2Chen B,Cheng H H.A review of the applications of agent technology in traffic and transportation systems[J].IEEE Transactions on Intelligent Transportation Systems,2010,11(2):485-497.
3Chen B,Cheng H H,Palen J.Integrating mobile agent technology with multi-agent systems for distributed traffic detection and management systems[J].Transportation Research Part C:Emerging Technologies,2009,17(1):1-10.
4Bazzan A.Opportunities for multiagent systems and multiagent reinforcement learning in traffic control[J].Autonomous Agents and Multi-Agent Systems,2009,18(3):342-375.
5Roozemond D A.Using intelligent agents for pro-active,real-time urban intersection control[J].European Journal of Operational Research,2001,131(2):293-301.
6Cai C Q,Yang Z S.Study on urban traffic management based on multi-agent system[C]//Proceedings of the 6th International Conference on Machine Learning and Cybernetics,Hong Kong,China:IEEE,2007:25-29.
7Chen C,Li Z J.A hierarchical networked urban traffic signal control system based on multi-agent[C]//Proceedings of the 9th IEEE International Conference on Networking,Sensing and Control(ICNSC).New York:IEEE,2012:28-33.
8Srinivasan D,Choy M C,Cheu R L.Neural networks for realtime traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2006,7(3):261-272.
9Gregoire P,Desjardins C,Laumonier J,et al.Urban traffic control based on learning agents[C]//Proceedings of Intelligent Transportation Systems Conference.New York:IEEE,2007:916-921.
10Weiring M A.Multi-agent reinforcement learning for traffic light control[C]//Proceedings of the 7th International Conference on Machine Learning(ICML2000).San Francisco:Morgan Kaufmann Publishers Incorporation,2000:1151-1158.

引证文献3

1戈军,周莲英.面向交通信号的两层递阶控制解决方案[J].计算机工程与应用,2015,51(20):246-252. 被引量：1
2李斌,刘全.基于最小二乘的双权重学习法[J].计算机科学,2020,47(12):210-217. 被引量：1
3刘庆强,刘鹏云.基于优先级经验回放的SAC强化学习算法[J].吉林大学学报（信息科学版）,2021,39(2):192-199. 被引量：7

二级引证文献9

1姚斐,宋芳.交通信号灯迭代学习控制方法[J].软件导刊,2020,19(8):95-99. 被引量：1
2冯帆.基于贝叶斯网络的车用空气弹簧智能测量与数值分析技术[J].电子设计工程,2022,30(14):34-38. 被引量：1
3夏琳,罗威,王俊霞,黄一学.基于后验经验回放的MAAC多智能体强化学习算法[J].软件,2023,44(2):17-22.
4金立生,韩广德,谢宪毅,郭柏苍,刘国峰,朱文涛.基于强化学习的自动驾驶决策研究综述[J].汽车工程,2023,45(4):527-540. 被引量：6
5林涛,王瑞祥,石琳.基于改进MDP的边缘计算任务卸载研究[J].计算机仿真,2023,40(3):359-363.
6李海川,阳周明,王洋,崔新悦,王娜.基于最近双经验SAC算法的无人分队控制研究[J].火力与指挥控制,2023,48(6):70-75.
7仇玉琴,庞少杰,徐燕,马悦悦.基于SAC算法的主动前轮转向与转矩矢量协调控制[J].家电维修,2024(2):104-106.
8海日,张兴亮,姜源,杨永健.稳定且受限的新强化学习SAC算法[J].吉林大学学报（信息科学版）,2024,42(2):318-325.
9杨南禹,时正华.基于PBRS-SAC算法的无人车路径规划研究[J].计算技术与自动化,2024,43(2):82-87.

1张春元,朱清新.基于对称扰动采样的Actor-critic算法[J].控制与决策,2015,30(12):2161-2167. 被引量：1
2王国芳,方舟,李平.基于批量递归最小二乘的自然Actor-Critic算法[J].浙江大学学报（工学版）,2015,49(7):1335-1342. 被引量：3
3王维丽,李永飞.图形协同设计中的收敛保证[J].常州工学院学报,2006,19(2):36-39.
4任红格,徐少彬,李福进.一种内在动机驱动的FRBF网络自主学习算法[J].河北联合大学学报（自然科学版）,2015,37(3):88-94.
5武玉英,胡喆,何喜军,蒋国瑞.公平关切下的供应链产销协同自适应协商策略[J].计算机工程,2016,42(4):160-167. 被引量：2
6陈仕超,凌兴宏,刘全,伏玉琛,陈桂兴.一种基于高斯过程的行动者评论家算法[J].计算机应用研究,2016,33(6):1670-1675. 被引量：1
7陈兴国,高阳,范顺国,俞亚君.基于核方法的连续动作Actor-Critic学习[J].模式识别与人工智能,2014,27(2):103-110. 被引量：8
8陈学松,杨宜民.基于执行器-评价器学习的自适应PID控制[J].控制理论与应用,2011,28(8):1187-1192. 被引量：14
9郝钏钏,方舟,李平.采用经验复用的高效强化学习控制方法[J].华南理工大学学报（自然科学版）,2012,40(6):70-75. 被引量：1
10程玉虎,王雪松,孙伟.一种自适应强化学习算法在状态空间构建中的应用[J].系统仿真学报,2006,18(1):188-191. 被引量：3

计算机科学

2014年第6期

浏览历史

内容加载中请稍等...

基于Tile Coding编码和模型学习的Actor-Critic算法被引量：3

参考文献16

同被引文献26

引证文献3

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

基于Tile Coding编码和模型学习的Actor-Critic算法 被引量：3

参考文献16

同被引文献26

引证文献3

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

基于Tile Coding编码和模型学习的Actor-Critic算法被引量：3