摘要
为解决民航不安全事件信息中分类不准确的问题,提出一种基于机器学习的数据清洗方案。首先,设计了一种基于one class svm的异常值筛选模型,筛选出现有航空不安全事件信息中事发阶段中标签错误的数据。其次,建立了一种基于BERT的分类模型对分类正确数据进行训练,利用经过训练的模型对筛选出来的异常数据进行重新分类。最后,将清洗后的数据与原始数据进行比较,清洗后的数据标签准确率提高了10.2%。实验结果表明,基于机器学习的数据清洗方法能够实现民航安全信息的数据清洗,提高了数据质量。
To solve the problem of inaccurate classification in civil aviation unsafe event information, this paper proposes a data cleaning method based on machine learning. Firstly, an outlier filtering model based on one class svm is designed to filter data with tag errors in aviation unsafe event information. Secondly, a classification model based on BERT is established to train the correct data, and the trained model is used to reclassify the screened abnormal data. Finally, comparing the cleaned data with the original data, the accuracy of the cleaned data label is improved by 10.2%. The experimental results show that the data cleaning method based on machine learning can realize the data cleaning of civil aviation safety information and improve the data quality.
作者
崔振新
曹志
CUI Zhenxin;CAO Zhi(Institute of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300,China)
出处
《综合运输》
2022年第4期80-83,115,共5页
China Transportation Review
关键词
民航安全信息
不安全事件
数据清洗
机器学习
Civil aviation safety information
Unsafe events
Data cleaning
Machine learning