期刊文献+

基于文本挖掘的高速铁路动车组故障多级分类研究 被引量:3

Research on Multi-level Classification of High-speed Railway Signal Equipment Fault based on Text Mining
下载PDF
导出
摘要 针对高速铁路信号设备故障发生后记录的文本数据,提出基于文本挖掘方式的高速铁路信号设备故障多级分类模型研究;提出TF-IDF词汇权重与词汇字典结合的特征表示方法实现信号设备故障文本数据的特征提取;多级分类模型中,基于Stacking集成学习思想设计单层分类模型,将循环神经网络BiGRU和BiLSTM作为初级学习器,设计权重组合计算方法作为次级学习器,将多级分类任务分解为各层单分类任务,并采用K折交叉验证训练Stacking模型;采用高速铁路自开通至十年的信号转辙机故障数据,通过对故障原因文本数据的分析,实现故障部位和故障原因的二级分类,经过K=5次训练,BiGRU较BiLSTM各评价指标都较高,经实验BiGRU分配权重为0.7,BiLSTM权重为0.3,组合加权对两个网络的输出计算,准确率提高为0.881 4,召回率提高为0.864 2;实验表明多级分类模型能够有效提升信号设备故障多级分类任务的分类评价指标,并能够保证分类结果隶属关系的正确性。 Aiming at the text data recorded after the failure of high-speed railway signal equipment,a multi-level classification model of high-speed railway signal equipment failure based on text mining is proposed.A feature representation method combining Term Frequency-Inverse Document Frequency(TF-IDF)word weight and word dictionary is proposed to extract the feature of signal equipment fault text data.In the multi-level classification model,the single-layer classification model was designed based on Stacking Integrated learning idea,the recurrent neural network Bidirection Gated Recurrent Unit(BiGRU)and Bidirection Long Short Term Memory(BiLSTM)were used as primary learners,and the weight combination calculation method was designed as secondary learners,multi-level classification tasks were decomposed into single classification tasks of each layer,and K-fold crossverification was used to train Stacking model.After k=5 training,the evaluation indexes of bigru are higher than those of bilstm.The weight of bigru and bilstm was 0.7 and 0.3 respectively.The output of the two networks is calculated by combination weighting,the accuracy is improved to 0.881 4,and the recall rate is increased to 0.864 2.High-speed railway from the opening to a decade of signal switch machine failure data,the secondary classification of fault location and fault cause is realized by analyzing the text data of fault cause,experiment show that multi-level classification model can effectively improve the classification of signal equipment failure multi-level classification task evaluation index,and can ensure the correctness of the subordinate relations classification results.
作者 高凡 李樊 张铭 王志飞 赵俊华 Gao Fan;Li Fan;Zhang Ming;Wang Zhifei;Zhao Junhua(Postgraduate Department,China Academy of Railway Science,Beijing 100081,China;China Academy of Railway Sciences Corporation Limited,Beijing 100081,China;Beijing Jingwei Information Technologies Co.,Ltd.,Beijing 100081,China)
出处 《计算机测量与控制》 2020年第7期59-63,共5页 Computer Measurement &Control
基金 国家自然科学基金(51967010) 铁科院集团公司重点课题(2019YJ115) 铁科院集团公司青年课题(2019YJ125) 中国国家铁路集团有限公司科研专项课题(J2019X005)。
关键词 高速铁路信号设备 多级分类 Stacking集成学习 循环神经网络 多任务协作投票决策树 high-speed railway signal equipment multilevel classification stacking integrated learning recurrent neural network multi-task collaborative voting decision tree
  • 相关文献

参考文献4

二级参考文献76

  • 1袁时金,李荣陆,周水庚,胡运发.层次化中文文档分类[J].通信学报,2004,25(11):55-63. 被引量:6
  • 2凌云,刘军,王勋.多层次web文本分类[J].情报学报,2005,24(6):684-689. 被引量:12
  • 3韦艳艳,李陶深.一种基于投票的Stacking方法[J].计算机工程,2006,32(7):199-201. 被引量:4
  • 4谭金波.一种改进的文档层次分类方法[J].现代图书情报技术,2007(2):56-59. 被引量:3
  • 5Silla C N, Freitas A A. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 2010, 22(1-2): 31-72.
  • 6Guan Hu, Zhou Jing-Yu, Guo Min-Yi. A class-feature-cen- troid classifier for text categorization//Proceedings of the 18th international conference on World Wide Web. Madrid, Spain, 2009:201-210.
  • 7Wang Xiao-Lin, Zhao Hai, Lu Bao-Liang. Enhance K Nea- rest neighbor algorithm for large-scale multi-labeled hierar- chical classification//Proceedings of the 2011 European Con- ference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Athens, Greece, 2011: 58-66.
  • 8Zhang Cong-Le, Xue Gui-Rong, YongZu et al. Web-scale classification with Naive Bayes//Proceedings of the 18th In- ternational Conference on World Wide Web. Madrid, Spain, 2009 : 1083-1084.
  • 9Labrou Y, Finin T W. Yahoo! as an ontology: Using Yahoo! Categories to describe documents//Proceedings of the 8th International Conference on Information and Knowl- edge Management. Kansas City, USA, 1999: 180-187.
  • 10Christophe Brouard. ECHO at the LSHTC pascal challenge 2//Proceedings of the 2011 European Conference on Machine Learning and Principles and Practice of Knowledge Diseovery in Databases. Athens, Greece, 2011:49-57.

共引文献107

同被引文献27

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部