摘要
由于航班延误数据集类别分布不均,传统分类器的性能受到一定程度的制约。为了能够对到港航班延误情况进行精准预测,提出了一种基于合成少数类过采样技术(synthetic minority oversampling technique,SMOTE)算法和条件生成对抗网络(conditional generative adversarial nets,CGAN)的航班延误预测模型。首先,利用SMOTE算法对原始数据集进行上采样,并融合经过训练的CGAN生成指定样本数据集,缓解原始数据集中某些类别样本量少和数据非平衡等问题;再次,采用XGBoost模型在4种模式训练集上进行训练和超参数寻优;最后,以K近邻、支持向量机和随机森林为基准模型进行性能对比分析。经试验分析,通过分类器在融合样本集的训练,整体上可以在一定程度上提高模型的泛化性,尤其在轻度延误和中度延误类别中提升较为明显,与不采用融合方法比较,宏平均下的Precision、Recall、F_(1)-score值分别提升了0.16、0.29、0.24个百分点。实验结果表明,该方法能够有效地对航班延误非平衡数据进行建模,在保持模型整体性能较高的前提下,能够显著地提升少数类的预测能力,可以为空管、航空公司和机场等提供决策依据。
Due to the uneven distribution of flight delay data sets,the performance of traditional classifiers is restricted to a certain extent.In order to be able to accurately predict the delay of arrival flights,a flight delay prediction model based on Synthetic Minority Oversampling Technique(SMOTE)and conditional generative adversarial nets(CGAN)was proposed.Firstly,SMOTE algorithm was used to over-sample the original data set,and integrated the trained conditional generative adversarial nets to generate a specified sample data set,alleviating the problem of small sample size and data imbalance in some categories in the original data set.Secondly,the XGBoost model was trained and hyper-parameters optimization were performed based on the training set of four modes.Finally,the performance comparison and analysis were performed with K-nearest neighbor,support vector machine and random forest as the benchmark model.After experimental analysis,through the training of the classifier in the fusion sample set,the generalization of the model can be improved to a certain extent,especially in the light delay and moderate delay categories.Compared with the non-fusion method,in the perspective of the macro avg,the values of Precision,Recall,and F_(1)-score is increased by 0.16%,0.29%,and 0.24%,respectively.Experimental results show that this method can effectively model flight delays and imbalanced data.Under the premise of maintaining high overall performance of the model,the proposed model can significantly improve the predictive capabilities of a few categories and provide decision-making basis for air traffic control,airlines and aerodrome.
作者
刘博
卢婷婷
张兆宁
张健斌
LIU Bo;LU Ting-ting;ZHANG Zhao-ning;ZHANG Jian-bin(College of Air Traffic Management, Civil Aviation University of China, Tianjin 300300, China;System Operations Center of China Southern Airlines Co., Ltd., Guangzhou 510000, China)
出处
《科学技术与工程》
北大核心
2021年第34期14843-14852,共10页
Science Technology and Engineering
基金
国家重点研发计划(2020YFB1600100)
中央高校基本科研业务费中国民航大学专项(3122017061)。