摘要
国家电网公司信息化工作中积累的大量典型故障案例多为描述性文本数据,很难利用自动化手段理解和分析。针对此一问题利用文本挖掘技术从故障案例中抽取故障问题和原因形成故障的因果关系,从而为故障文本的下一步挖掘提供必要条件。挖掘采用的方法是先将因果关系的抽取问题转化为对句子的3分类问题,缩小了目标集,提高了准确率;再将句子用分布式文本表示,利用双向长短期记忆网络(Bi LSTM),分类模型提取事件句的深层语义特征。变压器故障案例的实验结果表明Bi LSTM相比于单向LSTM、卷积神经网络(CNN)处理故障文本句子分类效果更优,对故障和原因信息的提取准确率更高,精确率和召回率的平均调和值达67%。
A large number of typical fault cases are accumulated in the information work of the national grid company,but it is difficult to use automation means to understand and analyze the text data.We utilize the text mining technology to extract the problems and causes of defects from typical fault texts to convert to the causal relationship of malfunction,which provides the necessary conditions for fault feature extraction.The concrete method is to first turn the problem of extracting causation into the problem of three categories of sentences.This method can narrow the target set,and improve the accuracy rate.Then the sentence is expressed in distributed text,and the bidirectional long short term term(Bi LSTM) classification model is used to extract the deep semantic features of the event sentence.The experimental results show that Bi LSTM is more effective in the classification of fault text sentences than LSTM,and the convolution neural network(CNN),the feature of fault and reason information in text is more accurate,and the f1-score can be up to 67%.
作者
杜修明
秦佳峰
郭诗瑶
闫丹凤
DU Xiuming;QIN Jiafeng;GUO Shiyao;YAN Danfeng(Electric Power Reasearch Institute of Shandong Power Supply Company of State Grid, Jinan 250000, China;State Key Laboratory of Networking and Switching, Beijing University of Posts and Telecommunications, Beijing 100876, China)
出处
《高电压技术》
EI
CAS
CSCD
北大核心
2018年第4期1078-1084,共7页
High Voltage Engineering
基金
国家高技术研究发展计划(863计划)(2015AA050204)
国家电网公司科技项目(520626170011)~~