期刊文献+

基于词项和语义融合的地铁信号设备故障文本预处理 被引量:11

Pre-processing of Metro Signaling Equipment Fault Text Based on Fusion of Lexical Domain and Semantic Domain
下载PDF
导出
摘要 数据预处理是数据驱动故障诊断的前提,为了更好地提取数据特征,针对地铁信号设备的故障记录提出一种基于词项和语义融合的文本自动预处理方法。采用隐式马尔可夫链识别特定线路的信号专有词汇,结合既有词库形成特定线路信号专用词库,解决线路专用信号设备故障词库的生成问题;分别在词项空间和语义空间进行故障记录的聚类、特征词提取与融合,形成故障记录的词袋表示,并在专家模板的基础上应用KNN算法统一故障记录描述,解决故障记录的模糊性问题。以某地铁线2015-2017年的故障记录为数据源进行试验,结果表明该方法有效,宏平均F1测试值达到95.56%。 Data pre-processing is an important step for data-driven fault diagnosis.For better feature extraction,a text pre-processing method merging the features of lexical domain and semantic domain was proposed for metro signaling equipment fault records.In order to generate the specific Chinese vocabulary for signal equipment fault of a certain line,the Hidden Markov Model(HMM)was applied to identify the exclusive terms that weren’t included in the existing vocabulary of urban rail transit.After transforming fault records into lexical domain and semantic domain based on the specific vocabulary,bag-of-words of fault records were formed by clustering,feature extraction and feature fusion of both domains.The KNN classification was applied to standardize fault records with the help of expert templates to solve the problem about irregularity and arbitrariness of fault records.With the fault records of a metro line from 2015 to 2017 as the data source,the experiments were designed to verify the new proposed data pre-processing method.The results show that the method is effective with the Macro-average F 1-score reaching 95.56%.
作者 胡小溪 牛儒 唐涛 HU Xiaoxi;NIU Ru;TANG Tao(State Key Laboratory of Rail Traffic Control and Safety,Beijing Jiaotong University,Beijing 100044,China)
出处 《铁道学报》 EI CAS CSCD 北大核心 2021年第2期78-85,共8页 Journal of the China Railway Society
基金 国家自然科学基金(U1934219) 北京市自然科学基金(L181006) 中国国家铁路集团有限公司科技研究开发计划(N2020G019) 城市轨道交通北京实验室资助。
关键词 文本预处理 地铁信号 HMM K-MEANS LDA text pre-processing metro signaling equipment HMM K-means LDA
  • 相关文献

参考文献5

二级参考文献105

共引文献392

同被引文献105

引证文献11

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部