摘要
Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.
基金
supported by the National Natural Science Foundation of China Civil Aviation Joint Fund (U1833110)
Research on the Dual Prevention Mechanism and Intelligent Management Technology f or Civil Aviation Safety Risks (YK23-03-05)。