期刊文献+

POMDP环境下交通信号自适应控制的策略梯度学习方法 被引量:2

Study on Policy Gradient Approach for Traffic Self-adaptive Control Under POMDP Environment
原文传递
导出
摘要 将交通自适应控制看成是POMDP(Partially Observable Markov Decision Process)问题,建立交叉口POMDP环境模型,结合值函数法的优点设计解决此问题的策略梯度学习算法。仿真实验与传统方法比较表明,在局部交通较少及高度饱和交通条件下此学习方法具有一定的收敛性和有效性,并对解决自适应交通控制问题具有一定的适用性。 This paper casts traffic self-adaptive control as POMDP(Partially Observable Markov Decision Process).The study employs a TSCA(Traffic Signal Control Agent)model for each signalized intersection,and built TSCA's POMDP model which was transformed to MDP.Policy gradient algorithm combined with value function method to solve such problem was designed.Simulation results show that the policy gradient method is convergent and effective under highly saturated conditions when the amount of local traffic is small compared to traditional traffic signal control algorithms.
作者 夏新海
出处 《武汉理工大学学报》 CAS CSCD 北大核心 2012年第7期51-56,共6页 Journal of Wuhan University of Technology
基金 江西省自然科学基金(2010GQS0076) 广州航海高等专科学校自然科学基金(201112B02)
关键词 POMDP 强化学习 策略梯度 交通信号控制 POMDP reinforcement learning policy gradient traffic signal control
  • 相关文献

参考文献18

  • 1Watkins C J C H,Dayan P.Technical Note Q-learning[J].Journal of Machine Learning,1992(8):279-292.
  • 2高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:271
  • 3Hu J,Wellman M P.Nash Q-learning for General-sum Stochastic Games[J].Journal of Machine Learning,2003(4):1039-1069.
  • 4Marco Wiering,Jilles Vreeken,Jelle van Veenen,et al.Simulation and Optimization of Traffic in a City[C]//IEEE Intelli-gent Vehicles Symposium(IV’04).Parma:[s.n.],2004:453-458.
  • 5Wiering M.Multi-agent Reinforcement Learning for Traffic Light Control[C]//Seventeeth International Conference onMachine Learning and Applications.SanFrancisco:Morgan Kaufmann Publishers Incorporation,2000:1151-1158.
  • 6Bakker B,Steingrver M,Schouten R,et al.Cooperative Multi-agent Reinforcement Learning of Traffic Lights[C]//Pro-ceedings of the Workshop on Cooperative Multi-agent Learning,European Conference on Machine Learning.Berlin:Springer,2005:24-36.
  • 7Thomas L Thorpe,Charles W Anderson.Traffic Light Control Using SARSA with Three State Representations[R].[S.l.]:IBM Corporation,1996.
  • 8马寿峰,李英,刘豹.一种基于agent协调的两路口交通控制方法[J].系统工程学报,2003,18(3):272-278. 被引量:25
  • 9Douglas Aberdeen.Policy-gradient Algorithms for Partially Observable Markov Decision Processes[D].Canberra,Aus-tralian National University,2003.
  • 10Jonathan Baxter,Peter L Bartlett.Infinite-horizon Policy-gradient Estimation[J].Journal of Artificial Intelligence Re-search,2001,15:319-350.

二级参考文献41

  • 1石纯一,王克宏,王学军,康小强,罗翊,胡军.分布式人工智能进展[J].模式识别与人工智能,1995,8(A01):72-92. 被引量:18
  • 2阎平凡.再励学习——原理、算法及其在智能控制中的应用[J].信息与控制,1996,25(1):28-34. 被引量:30
  • 3李英 刘豹.agent系统(MAS)研究进展[A]..第三届全球智能控制与自动化大会论文集[C].,2000 1.246--249.
  • 4石纯一,张伟.基于Agent的计算[M].北京:清华大学出版社.2007:11-12,119-120.
  • 5Burmeister B,Haddadi A,Matylis G.Application of multi-agent systems in traffic and transportation[J].IEEE Proceedings on Soft ware Engineering,1997,144(1):51-60.
  • 6Susan E L.Issues in multi-agent design systems[J],IEEE Ex pert Intelligent Systems & Their Application,1997,12(2):18-26.
  • 7Roozemond D A,van der Veer P.Usability of intelligent agent systems in urban trafficmanagement[J].Application of Artifical Intelligence in Engineering,1999(7):15-18.
  • 8WIERING M. Multi-Agent Reinforcement. Learning for Traffic Light Control [ C ] //Seventeeth International Conference on Machine Learning and Applications. San Francisco, CA : Morgan Kaufmann Publishers Incorporation, 2000 : 1151 - 1158.
  • 9PENDRITH M D. Distributed Reinforcement Learning for a Traffic Engineering Application [ M ]. New York: ACM Press, 2000:404-411.
  • 10ABDULHAI B, PRINGLE P. Autonomous Multiagent Reinforcement Learning: 5gc Urban Traffic Control [ C/ OL] //Annual Transportation Research Board Meeting. Shoreham: TRB, 2003.

共引文献294

同被引文献21

  • 1汪贤裕,肖玉明.博弈论及其应用[M].科学出版社,2008.
  • 2Shahaboddin Shamshirband. A Distributed Approach for Coor- dination Between Traffic Lights Based on Game Theory [ J]. The International Arab Journal of Information Technology, 2012,9(2) :148 - 152.
  • 3I. Alvarez, A. Poznyak,A. Malo. Urban Traffic Control Prob- lem a Game Theory Approach [ C ]. Proceedings of the 47th IEEE Conference on Decision and Control. IEEE ,2008:2168 - 2172.
  • 4WUNDERLICH R., LIU C., ELHANANY I., URBANIK T. A novel signal scheduling algorithm with quality of service provisioning for an isolate intersection [J]. IEEE Trans. In- tell. Transp. Syst. ,2008, 9, (3) :536 -547.
  • 5DE OLIVEIRA L. B. ,CAMPONOGARA E. Multi-agent mod- el predictive control of signaling split in urban traffic networks [ J]. Transp. Res. ,2010, 18(1) :120 - 139.
  • 6黄欣,杨新苗,常玉林,程杰.两相位协调控制交叉口延误计算[J].公路工程,2007,32(4):60-64. 被引量:4
  • 7MINSKV M. The Society of Mind[M].Simon &Schuster,NewYork,1986.
  • 8WOOLDRIDGE M,JEENINGSN R,KINNY D. The Gaia methodologyfor agent -oriented analysis and designs[J].Autonomous Agentsand Multi -Agent Systems,2000,(03):285-312.
  • 9李英.多Agent系统及其在预测与智能交通系统中的应用[M]上海:华东理工大学出版社,2004154-158.
  • 10石纯一.基于Agent的计算[M]北京:清华大学出版社,2007149-161.

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部