Air traffic complexity is a critical indicator for air traffic operation,and plays an important role in air traffic management(ATM),such as airspace reconfiguration,air traffic flow management and allocation of air tr...Air traffic complexity is a critical indicator for air traffic operation,and plays an important role in air traffic management(ATM),such as airspace reconfiguration,air traffic flow management and allocation of air traffic controllers(ATCos).Recently,many machine learning techniques have been used to evaluate air traffic complexity by constructing a mapping from complexity related factors to air traffic complexity labels.However,the low quality of complexity labels,which is named as label noise,has often been neglected and caused unsatisfactory performance in air traffic complexity evaluation.This paper aims at label noise in air traffic complexity samples,and proposes a confident learning and XGBoost-based approach to evaluate air traffic complexity under label noise.The confident learning process is applied to filter out noisy samples with various label probability distributions,and XGBoost is used to train a robust and high-performance air traffic complexity evaluation model on the different label noise filtered ratio datasets.Experiments are carried out on a real dataset from the Guangzhou airspace sector in China,and the results prove that the appropriate label noise removal strategy and XGBoost algorithm can effectively mitigate the label noise problem and achieve better performance in air traffic complexity evaluation.展开更多
目的随着实际应用场景中海量数据采集技术的发展和数据标注成本的不断增加,自监督学习成为海量数据分析的一个重要策略。然而,如何从海量数据中抽取有用的监督信息,并该监督信息下开展有效的学习仍然是制约该方向发展的研究难点。为此,...目的随着实际应用场景中海量数据采集技术的发展和数据标注成本的不断增加,自监督学习成为海量数据分析的一个重要策略。然而,如何从海量数据中抽取有用的监督信息,并该监督信息下开展有效的学习仍然是制约该方向发展的研究难点。为此,提出了一个基于共识图学习的自监督集成聚类框架。方法框架主要包括3个功能模块。首先,利用集成学习中多个基学习器构建共识图;其次,利用图神经网络分析共识图,捕获节点优化表示和节点的聚类结构,并从聚类中挑选高置信度的节点子集及对应的类标签生成监督信息;再次,在此标签监督下,联合其他无标注样本更新集成成员基学习器。交替迭代上述功能块,最终提高无监督聚类的性能。结果为验证该框架的有效性,在标准数据集(包括图像和文本数据)上设计了一系列实验。实验结果表明,所提方法在性能上一致优于现有聚类方法。尤其是在MNIST-Test(modified national institute of standards and technology database)上,本文方法实现了97.78%的准确率,比已有最佳方法高出3.85%。结论该方法旨在利用图表示学习提升自监督学习中监督信息捕获的能力,监督信息的有效获取进一步强化了集成学习中成员构建的能力,最终提升了无监督海量数据本质结构的挖掘性能。展开更多
基金This work was supported by the Na⁃tional Natural Science Foundation of China(No.61903187)Nanjing University of Aeronautics and Astronautics Graduate Innovation Base(Laboratory)Open Fund(No.kfjj20190732)。
文摘Air traffic complexity is a critical indicator for air traffic operation,and plays an important role in air traffic management(ATM),such as airspace reconfiguration,air traffic flow management and allocation of air traffic controllers(ATCos).Recently,many machine learning techniques have been used to evaluate air traffic complexity by constructing a mapping from complexity related factors to air traffic complexity labels.However,the low quality of complexity labels,which is named as label noise,has often been neglected and caused unsatisfactory performance in air traffic complexity evaluation.This paper aims at label noise in air traffic complexity samples,and proposes a confident learning and XGBoost-based approach to evaluate air traffic complexity under label noise.The confident learning process is applied to filter out noisy samples with various label probability distributions,and XGBoost is used to train a robust and high-performance air traffic complexity evaluation model on the different label noise filtered ratio datasets.Experiments are carried out on a real dataset from the Guangzhou airspace sector in China,and the results prove that the appropriate label noise removal strategy and XGBoost algorithm can effectively mitigate the label noise problem and achieve better performance in air traffic complexity evaluation.
文摘目的随着实际应用场景中海量数据采集技术的发展和数据标注成本的不断增加,自监督学习成为海量数据分析的一个重要策略。然而,如何从海量数据中抽取有用的监督信息,并该监督信息下开展有效的学习仍然是制约该方向发展的研究难点。为此,提出了一个基于共识图学习的自监督集成聚类框架。方法框架主要包括3个功能模块。首先,利用集成学习中多个基学习器构建共识图;其次,利用图神经网络分析共识图,捕获节点优化表示和节点的聚类结构,并从聚类中挑选高置信度的节点子集及对应的类标签生成监督信息;再次,在此标签监督下,联合其他无标注样本更新集成成员基学习器。交替迭代上述功能块,最终提高无监督聚类的性能。结果为验证该框架的有效性,在标准数据集(包括图像和文本数据)上设计了一系列实验。实验结果表明,所提方法在性能上一致优于现有聚类方法。尤其是在MNIST-Test(modified national institute of standards and technology database)上,本文方法实现了97.78%的准确率,比已有最佳方法高出3.85%。结论该方法旨在利用图表示学习提升自监督学习中监督信息捕获的能力,监督信息的有效获取进一步强化了集成学习中成员构建的能力,最终提升了无监督海量数据本质结构的挖掘性能。