基于ACCA的Option自动生成算法被引量：1

Option automatic generation algorithm based on ACCA

下载PDF

导出

摘要提出了一种新的分层强化学习(HRL)Option自动生成算法,以Agent在学习初始阶段探测到的状态空间为输入,并采用改进的蚁群聚类算法(ACCA)对其进行聚类,在聚类后的各状态子集上通过经验回放学习产生内部策略集,从而生成Option,仿真实验验证了该算法是有效的。 A new algorithm for Option automatic generation of hierarchical reinforcement learning is presented.The algorithm takes the state space explored by Agent as input in the initial learning phase and clusters the states employing Ant Colony Clustering Algorithm （ACCA）.Based on the clustered state sets,the intra-strategies are learned by an experience replay procedure.As a resuh,the Options are generated.The validity of the algorithm is demonstrated by simulation experiments.

作者胡明辉殷苌茗李立云

机构地区长沙理工大学计算机与通信工程学院

出处《计算机工程与应用》 CSCD 北大核心 2008年第19期39-40,49,共3页 Computer Engineering and Applications

关键词分层强化学习 OPTION 蚁群聚类算法经验回放 hierarchical reinforcement learning Option Ant Colony Clustering Algorithm（ACCA） experience replay

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献9

1Sutton R S,Precup D,Singh S P.Between MDPs and Semi-MDPs: a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999,112(1/2) : 181-211.
2Parr R.Hierarchical control and learning for Markov decision processes[D].Berkeley:University of California, 1998.
3Dietterich T G.Hierarchical reinforcement Learning with the MAXQ value function decomposition[J].Journal of Artificial Intelligence Research, 2000,13 : 227-303.
4McGovern A,Barto A.Autonomous discovery of subgoals in reinforcement learning using deverse density[C]//Proceedings of the 8th International Conference on Machine Learning.San Fransisco: Morgan Kaufmann, 2001 : 361-368.
5Menache I,Mannor S,Shimkin N.Q-cut:dynamic discovery of subgoals in reinforcement learning[C]//LNCS 2430:Proc of the 13th ECML, 2002: 295-306.
6沈晶,顾国昌,刘海波.分层强化学习中的Option自动生成算法[J].计算机工程与应用,2005,41(34):4-6. 被引量：5
7王本年,高阳,陈兆乾,谢俊元,陈世福.面向Option的k-聚类Subgoal发现算法[J].计算机研究与发展,2006,43(5):851-855. 被引量：8
8Deneubourg J L,Goss S,Franks N,et al.The dynamics of collective sorting Robot-like ants and ant-like robots[C]//Meye J A, Wilson S.Proceedings of the First International Conference on Simulation Adaptive Behaviours From Animals to Animals.Cambridge MA,J MIT Press, 1991-356-365.
9Lin L G.Self-improvlng reactive agents based on reinforcement learning,planning and teaching[J].Machine Learning,1992,8(3/4): 293-321.

二级参考文献21

1[2]A G Barto,S Mahadevan.Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003; 13 (4):41～77
2[3]R S Sutton,D Precup,S P Singh.Between MDPs and Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning[J].Artificial Intelligence,1999; 112 (1-2):181～211
3[4]R Parr.Hierarchical Control and Learning for Markov Decision Processes[D].Ph D Thesis.University of California,Berkeley,1998
4[5]T G Dietterich.Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition[J].Journal of Artificial Intelligence Research,2000; 13:227～303
5[6]B L Digney.Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments[C].In:Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321～330
6[7]A McGovern ,A Barto.Autonomous Discovery of Subgoals in Reinforcement Learning Using Deverse Density[C].In :Proceedings of the Fifth International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361～368
7[8]I Menache,S Mannor,N Shimkin.Q-Cut:Dynamic discovery of subgoals in reinforcement learning.Lecture Notes in Computer Science,Springer,Vol 2430,2002:295～306
8[9]S Mannor et al.Dynamic Abstraction in Reinforcement Learning via Clustering[C].In :Proceedings of the Twenty-First International Conference on Machine Learning,Banff,Canada,2004:560～567
9[10]D Precup.Temporal Abstraction in Reinforcement Learning[D].Ph D Dissertation.University of Massachusetts,Amherst,2000
10[11]N K Jerne.Towards a Network Theory of the Immune System[J].Annual Immunology,1974; 125C(1-2) :373～389

共引文献10

1沈晶,顾国昌,刘海波.一种新的分层强化学习方法[J].计算机应用,2006,26(8):1938-1939. 被引量：1
2孟江华,朱纪洪,孙增圻.基于探索密度的Option子目标发现算法[J].模式识别与人工智能,2007,20(2):236-240.
3杜小勤,李庆华,韩建军.一种基于HAMs体系的层次分解方法[J].小型微型计算机系统,2008,29(4):653-658.
4石川,史忠植,王茂光.基于路径匹配的在线分层强化学习方法[J].计算机研究与发展,2008,45(9):1470-1476. 被引量：4
5徐明亮,苏晓萍,须文波.基于禁忌搜索的option自动构造[J].系统仿真学报,2009,21(23):7479-7482.
6孙祥,赵勇.基于就业吸引力的大学生区域流向分类研究[J].黄冈师范学院学报,2010,30(3):46-51. 被引量：6
7胡坤,余雪丽,李志.一种改进的自动分层算法BMAXQ[J].计算机工程与应用,2011,47(30):1-3. 被引量：1
8姜懿庭.基于人工免疫系统的检测器生成算法改进[J].云南民族大学学报（自然科学版）,2012,21(2):141-144. 被引量：1
9XIAO Ding,LI Yi-tong,SHI Chuan.Autonomic discovery of subgoals in hierarchical reinforcement learning[J].The Journal of China Universities of Posts and Telecommunications,2014,21(5):94-104. 被引量：1
10魏竞毅,赖俊,陈希亮.基于互信息的智能博弈对抗分层强化学习研究[J].计算机技术与发展,2022,32(9):142-147.

同被引文献14

1张武军,叶剑锋,梁伟杰,方鸽飞.基于改进遗传算法的多目标无功优化[J].电网技术,2004,28(11):67-71. 被引量：81
2彭志平,李绍平.一种基于PSO的分层策略搜索算法[J].模式识别与人工智能,2008,21(1):98-103. 被引量：1
3余涛,周斌,甄卫国.强化学习理论在电力系统中的应用及展望[J].电力系统保护与控制,2009,37(14):122-128. 被引量：28
4王家林,夏立,吴正国,杨宣访.采用量子遗传算法的电力系统PMU最优配置[J].高电压技术,2010,36(11):2838-2842. 被引量：13
5谢光强,陈学松.一种新的基于蚁群优化的模糊强化学习算法[J].计算机应用研究,2011,28(4):1266-1268. 被引量：2
6余涛,周斌,陈家荣.基于多步回溯Q(λ)学习的互联电网随机最优CPS控制[J].电工技术学报,2011,26(6):179-186. 被引量：14
7周天睿,康重庆,徐乾耀,陈启鑫.电力系统碳排放流分析理论初探[J].电力系统自动化,2012,36(7):38-43. 被引量：124
8余涛,刘靖,胡细兵.基于分布式多步回溯Q(λ)学习的复杂电网最优潮流算法[J].电工技术学报,2012,27(4):185-192. 被引量：11
9李保卫,胡泽春,宋永华,王广辉.电力碳排放区域分摊的原则与模型[J].电网技术,2012,36(7):12-18. 被引量：48
10李保卫,胡泽春,宋永华,方晓松,杨俊.用户侧电力碳排放强度的评估原则与模型[J].电网技术,2012,36(8):6-11. 被引量：25

引证文献1

1郭乐欣,张孝顺,谭敏,余涛.基于群智能强化学习的电网最优碳-能复合流算法[J].电测与仪表,2017,54(1):1-7. 被引量：4

二级引证文献4

1陈忠,吴靓.基于数据挖掘技术的电网网损的评价算法的研究[J].电工技术,2018(9):3-6.
2张昭,张天奇.基于群体智能的自组织运动控制综述[J].电子科技,2019,32(11):52-57. 被引量：2
3夏鹏,许宁,周永刚.计及静态电压稳定的配电网多目标无功优化研究[J].信息技术,2023,47(3):64-69.
4杨雨瑶,潘峰,钟立华,张军,招景明.基于神经网络的电力系统节点碳排放因子预测方法[J].广东电力,2023,36(10):2-9. 被引量：1

1林明,朱纪洪,孙增圻.固定长度经验回放对Q学习效率的影响[J].计算机工程,2006,32(6):7-10. 被引量：1
2沈晶,顾国昌,刘海波.分层强化学习中的Option自动生成算法[J].计算机工程与应用,2005,41(34):4-6. 被引量：5
3张乐.工程数据库程序设计语言HRL[J].河海科技进展,1992,12(1):87-94.
4臧丽,王红,杨通辉.基于改进的ACCA的复杂网络社团结构发现[J].计算机技术与发展,2012,22(10):129-132. 被引量：1
5张欣,戴帅.基于模糊聚类的分层强化学习算法[J].计算机工程与科学,2010,32(1):55-56.
6中国航空结算中心（ACCA）机场收入结算网络[J].中国计算机用户,2001(16):32-32.
7KX.0105.美国新技术可3D打印耐1700℃高温的超强陶瓷[J].军民两用技术与产品,2016,0(1):33-33.
8ACCA证书类别[J].审计文摘,2006(2):77-77.
9REINHARDIRRGANG.更快，但未必更高[J].现代制造,2012(4):46-47.
10ACCA在上海举办理念先导论坛[J].审计文摘,2008(11):110-110.

计算机工程与应用

2008年第19期

浏览历史

内容加载中请稍等...

基于ACCA的Option自动生成算法被引量：1

参考文献9

二级参考文献21

共引文献10

同被引文献14

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于ACCA的Option自动生成算法 被引量：1

参考文献9

二级参考文献21

共引文献10

同被引文献14

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于ACCA的Option自动生成算法被引量：1