摘要
随着国内民航领域的飞速发展,航班延误次数及程度也在不断加深。因此,各航空公司对于航班延误时间预测的需求就更加强烈。采用随机森林回归算法来进行航班延误的预测。其中使用的原始数据来自于美国交通统计局(BST)上发布的数据。首先,对原始数据进行处理,通过分析影响航班到港时间的因素进行特征筛选,并进行数据清洗。然后,训练模型,并使用Grid-Search和交叉验证法选取最优的参数。最后,与支持向量机回归与岭回归算法进行对照分析,发现随机森林有较好的预测效果。实验结果显示,延误预测的R-squared为0.91和平均绝对误差为10.56分钟。
With the rapid development of the civil aviation industry,the phenomenon of flight delays has become more frequent.Therefore,forecasting flight delays is particularly important.Uses the random forest regression algorithm to predict flight delays.The raw data used in the search is derived from data published by The Bureau of Transportation Statistics (BST).First,processes the raw data,and analyzes the characteristics of the flight arrival time to analyze the characteristics and do the data cleaning.Then,trains the model and uses the Grid-Search and cross-validation to select the optimal parameters.Finally,compared with support vector machine regression and ridge regression algorithm, it is found that random forest has better prediction results.The experimental results show that R2 is 0.91 and MAE is 10.56 minutes.
作者
刘中祥
王欣
LIU Zhong-xiang;WANG Xin(College of Computer Science,Civil Aviation Flight University of China,Guanghan 618307)
出处
《现代计算机》
2019年第15期20-24,共5页
Modern Computer
基金
大学生创新创业训练计划项目(No.201810624153)
关键词
航班延误预测
回归预测分析
随机森林回归
Flight Delay Prediction
Regression Prediction Analysis
Random Forest Regression