摘要
针对传统算法规划航空器滑行路径准确度低、不能根据整体场面运行情况进行路径规划的问题,提出一种基于Q-Learning的路径规划方法。通过对机场飞行区网络结构模型和强化学习的仿真环境分析,设置了状态空间和动作空间,并根据路径的合规性和合理性设定了奖励函数,将路径合理性评价值设置为滑行路径长度与飞行区平均滑行时间乘积的倒数。最后,分析了动作选择策略参数对路径规划模型的影响。结果表明,与A*算法和Floyd算法相比,基于Q-Learning的路径规划在滑行距离最短的同时,避开了相对繁忙的区域,路径合理性评价值高。
A path planning method based on Q-Learning was proposed to address the issue of low accuracy in aircraft taxi path planning based on traditional algorithms and inability to plan path according to overall surface operation.By analyzing the network structure model of the airport flight zone and the simulated environment of reinforcement learning,the state space and action space were set,and the reward function was set based on the compliance and rationality of the path,and the evaluation value of the path rationality was set as the reciprocal of the product of the length of the taxi path and the average taxi time in the flight zone.Finally,the impact of parameters of action selection strategy on the path planning model was analyzed.The results showed that compared with the A*algorithm and Floyd algorithm,path planning based on Q-learning can avoid relatively busy areas while minimizing the taxi distance,resulting in high evaluation values of path rationality.
作者
王兴隆
王睿峰
WANG Xinglong;WANG Ruifeng(College of Air Traffic Management,CAUC,Tianjin 300300,China)
出处
《中国民航大学学报》
CAS
2024年第3期28-33,共6页
Journal of Civil Aviation University of China
基金
国家重点基础研究发展计划项目(2020YFB1600101)
天津市教委科研项目(2020ZD01)。