APPLICATION OF HIERARCHICAL REINFORCEMENT LEARNING IN ENGINEERING DOMAIN 被引量：3

APPLICATION OF HIERARCHICAL REINFORCEMENT LEARNING IN ENGINEERING DOMAIN

导出

作者 WEILI QingtaiYE ChangmingZHU

机构地区 CollegeofMachine&DynamicsEngineeringShanghaiJiaoTongUniversity

出处《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2005年第2期207-217,共11页 系统科学与系统工程学报（英文版）

基金 ThisworkwassupportedpartlybytheNationalNaturalScienceFoundationofChinaunderGrantNo.69975013

关键词 Engineering domain knowledge CONTROLLER reinforcement learning elevator group control Engineering domain knowledge, controller, reinforcement learning, elevator,group control

分类号 TU857 [建筑科学]

引文网络
相关文献

参考文献11

1[1]Bao, G., C. G. Cassandras, T. E. Djaferis,A.D. Gandhi, and D. P. Looze, "Elevator dispatchers for down peak traffic", ECE Department Technical Report, University of Massachusetts, 1994.
2[2]Barto, A. G., S. Mahadevan, "Recent advances in hierarchical reinforcement learning", Discrete Event Dynamic Systems:Theory and Applications, Vol. 13, pp41-77,2003.
3[3]Bradtke, S. J. and M. O. Duff,"Reinforcement learning methods for continuous-time Markov decision problems", Advances in Neural Information Processing Systems 7,Cambridge, MA, 1995.
4[4]Crites, R. H. and A. G. Barto, "Improving elevator performance using reinforcement learning", Advances in Neural Information Processing Systems 8, pp1017-1023, 1996.
5[5]Mahadevan, S., M. Nicholas, D. Tapas. and G. Abhijit, "Self-Improving factory simulation using continuous-time average-reward reinforcement learning",Proceedings of the 14th International Conference on Machine Learning (IMLC ′97), Nashville, TN, 1997.
6[6]Mataric, M., "Reinforcement learning in the multi-robot domain", Autonomous Robots, Vol. 4, No. 1, pp73-83, 1997.
7[7]Parr, R., "Hierarchical control and learning for markov decision processes", Ph.D.dissertation, University of California,Berkeley, CA, 1998.
8[8]Rajbala, M., M. Sridhar, and G.Mohammad, "Hierarchical multi-agent reinforcement learning", Proceedings of the fifth International Conference on Autonomous Agents, pp246-253, 2001.
9[9]Sutton, R.S. and A.G. Barto, Reinforcement Learning: An Introduction, Cambridge,MA: MIT Press, 1998.
10[10]Szepesvari, C. and M. L. Littman, "A unified analysis of value-function-based reinforcement learning algorithms", Neuro Computing, Vol. 11, pp2017-2060, 1999.

同被引文献46

1苏畅,高阳,陈世福,陈兆乾.基于SMDP环境的自主生成options算法的研究[J].模式识别与人工智能,2005,18(6):679-684. 被引量：9
2彭志平,彭宏,郑启伦.一种双边多议题自治协商模型的研究[J].电子与信息学报,2007,29(3):733-738. 被引量：12
3高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量：38
4FISCHER F, ROVATSOS M, WEISS G. Hierarchical reinforcement learning in communication-mediated multiagent coordination [ C ]// Proc of the 3rd International Conference on Autonomous Agents and Muhiagent Systems. New York: ACM Press, 2004.
5HENGST B. Discovering hierarchy in reinforcement learning [ D ]. Sydney: University of New South Wales, 2003.
6SKELLY M M. Hierarchical reinforcement learning with function approximation for adaptive control [ D ]. Ohio : Case Western Reserve University, 2004.
7UTHER W T B. Tree based hierarchical reinforcement learning[ D]. Pittsburgh: Carnegie Mellon University, 2002.
8BELLMAN R E, DREYFUS S E. Applied dynamic programming [ M ]. New Jersey : Princeton University Press, 1962.
9WATKINS C, DAYAN P. Q-learning[J]. Machine Learning, 1992,8(3 ) :279-292.
10PARR R. Hierarchical control and learning for Markov decision processes [ D ]. Berkeley, Califomia: University of California, 1998.

引证文献3

1彭志平,李绍平.一种基于PSO的分层策略搜索算法[J].模式识别与人工智能,2008,21(1):98-103. 被引量：1
2彭志平,李绍平.分层强化学习研究进展[J].计算机应用研究,2008,25(4):974-978. 被引量：7
3唐昊,张晓艳,韩江洪,周雷.基于连续时间半马尔可夫决策过程的Option算法[J].计算机学报,2014,37(9):2027-2037. 被引量：2

二级引证文献10

1宋炯,金钊,杨维和.机器学习中加速强化学习的一种函数方法[J].云南大学学报（自然科学版）,2011,33(S2):176-181.
2廉佐政,王海珍,邓文新,滕艳平.应用记忆演化学习的Agent协商研究[J].计算机工程与应用,2009,45(19):131-133. 被引量：1
3戴朝晖,袁姣红,吴敏,陈鑫.基于概率模型的动态分层强化学习[J].控制理论与应用,2011,28(11):1595-1600. 被引量：2
4李誌,胡坤,余雪丽.基于半马氏博弈模型的分层强化学习研究[J].计算机工程与设计,2012,33(9):3558-3562. 被引量：2
5唐昊,张晓艳,韩江洪,周雷.基于连续时间半马尔可夫决策过程的Option算法[J].计算机学报,2014,37(9):2027-2037. 被引量：2
6彭志平,周晓柯,孙志毅.一种融合Options与蚁群算法的虚拟机自适应配置方法[J].小型微型计算机系统,2015,36(4):801-805.
7王蕾.一种基于示例轨迹的抽象动作树构造方法[J].计算机与现代化,2016(6):85-90. 被引量：1
8朱斐,许志鹏,刘全,伏玉琛,王辉.基于可中断Option的在线分层强化学习方法[J].通信学报,2016,37(6):65-74. 被引量：4
9郭乐欣,张孝顺,谭敏,余涛.基于群智能强化学习的电网最优碳-能复合流算法[J].电测与仪表,2017,54(1):1-7. 被引量：4
10曹洁,邵紫旋,侯亮.基于分层强化学习的自动驾驶车辆掉头问题研究[J].计算机应用研究,2022,39(10):3008-3012. 被引量：1

1LiLi.王承龙:新技术写就的木质生活[J].缤纷,2011(9):63-63.
2王承龙,刘凌晨,冯元玥.设计“慢”谈[J].城市环境设计,2015,0(6):273-275.
3张洁,刘凌晨,王承龙.慢建筑--因木建筑而“慢”[J].建筑技艺,2016,22(8):76-81.
4Hao Hua.Special report on the international conference and exhibition on Architectural Algorithms εt Applications （the AAA conference） in Nanjing[J].Frontiers of Architectural Research,2017,6(1):108-110.
5费致为.W house/SLOW工作室[J].城市环境设计,2015,0(6):280-285.
6侯立萍,王承龙.“慢”建筑[J].建筑知识,2012,32(7):68-72.
7张游.北京城边的木头房子[J].建筑知识,2013,33(1):100-103.
8RE-NEW System与“透博新”凝胶防水材料[J].中国建筑防水,2006(1):20-21.
9付思量,朱文一.北京奥林匹克公园中心区城市活力研究[J].北京规划建设,2010(2):89-93. 被引量：2
10Ping-JingQiu.Chinese Thoughts on domain on the value of standard during the period of social transition[J].International Journal of Technology Management,2014(5):19-21.

Journal of Systems Science and Systems Engineering

2005年第2期

浏览历史

内容加载中请稍等...

APPLICATION OF HIERARCHICAL REINFORCEMENT LEARNING IN ENGINEERING DOMAIN 被引量：3

参考文献11

同被引文献46

引证文献3

二级引证文献10

相关作者

相关机构

相关主题

浏览历史