摘要
运用数据挖掘技术进行铁路事故类型预测及成因分析,对于建立铁路事故预警机制具有重要意义.为此,本文提出一种基于梯度提升决策树(Grandient boosting decision tree,GBDT)的铁路事故类型预测及成因分析算法.针对铁路事故记录数据缺失的问题,提出一种基于属性分布概率的补全算法,最大程度保持原有数据分布,从而降低数据缺失对事故类型预测造成的影响.针对铁路事故记录数据类别失衡的问题,提出一种集成的GBDT模型,完成对事故类型的鲁棒性预测.在此基础上,根据GBDT预测模型中特征重要度排序,实现事故成因分析.通过在开放数据库上进行实验,验证了本文模型的有效性.
The application of data mining technology in railway accident type prediction and cause analysis is of great significance to establish railway accident early warning mechanism.This paper proposes a gradient boosting decision tree(GBDT)based algorithm for railway accident type prediction and cause analysis.In order to solve the problem of data missing in railway accident record dataset,we propose a new data complement algorithm based on the attribute distribution probability,which can keep the distribution of original data as much as possible,thus reducing the impact of data missing on predicting railway accident type.To reduce the impact of unbalanced categories of data in railway accident dataset,an ensemble GBDT model is proposed to predict the types of accidents effectively and robustly.On these bases,according to the importance of features in GBDT prediction model,we complete the cause analysis of railway accidents.Experimental results on an open database show that our proposed method can predict the types and causes of railway accidents effectively.
作者
钟敏慧
张婉露
李有儒
朱振峰
赵耀
ZHONG Min-Hui;ZHANG Wan-Lu;LI You-Ru;ZHU Zhen-Feng;ZHAO Yao(Institute of Information Science,Beijing Jiaotong University,Beijing 100044;Beijing Key Laboratory of Advanced Inform-ation Science and Network Technology,Beijing 100044)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2022年第2期470-478,共9页
Acta Automatica Sinica
基金
科技创新2030-“新一代人工智能”重大项目(2018AAA0102101)
中央高校基本科研业务费(2018JBZ001)
国家自然科学基金(61976018,61532005)资助。
关键词
事故类型预测
缺失补全
GBDT
集成学习
成因分析
Prediction of railway accident type
missing data completion
GBDT
ensemble learning
cause analysis