摘要
将交通自适应控制看成是POMDP(Partially Observable Markov Decision Process)问题,建立交叉口POMDP环境模型,结合值函数法的优点设计解决此问题的策略梯度学习算法。仿真实验与传统方法比较表明,在局部交通较少及高度饱和交通条件下此学习方法具有一定的收敛性和有效性,并对解决自适应交通控制问题具有一定的适用性。
This paper casts traffic self-adaptive control as POMDP(Partially Observable Markov Decision Process).The study employs a TSCA(Traffic Signal Control Agent)model for each signalized intersection,and built TSCA's POMDP model which was transformed to MDP.Policy gradient algorithm combined with value function method to solve such problem was designed.Simulation results show that the policy gradient method is convergent and effective under highly saturated conditions when the amount of local traffic is small compared to traditional traffic signal control algorithms.
出处
《武汉理工大学学报》
CAS
CSCD
北大核心
2012年第7期51-56,共6页
Journal of Wuhan University of Technology
基金
江西省自然科学基金(2010GQS0076)
广州航海高等专科学校自然科学基金(201112B02)