摘要
事故风险因素文本泛化是建立油气储运企业事故风险因素演化知识图谱的重要步骤。为解决现有事件文本泛化方法对油气储运企业生产过程中积累的风险因素文本泛化时的语义表征局限性以及存在的分词误差问题,针对油气储运企业安全管理文本语言表达复杂多变的特点,提出基于字词特征-凝聚层次聚类(Char-Word Feature Based AGNES,CW-AGNES)的事故风险因素文本泛化方法。利用Word2Vec方法获取油气储运企业事故的字特征与二元词特征向量;根据预训练词向量模型对油气储运企业事故风险因素文本进行向量化表示;在凝聚层次聚类方法的基础上加入文本的字词特征,在保留词语语义信息的基础上减少由于分词带来的误差,实现风险因素文本的泛化。在真实油气储运企业安全管理文本上对CW-AGNES方法进行应用,并与其他泛化方法进行对比,结果表明:该方法的泛化效果更好,分别在AMI、ARI、V-measure及FMI量化评估指标上提高了2.44%~5.74%,可为油气储运领域事故风险知识图谱构建研究提供支持。
The textual generalization of accident risk factors is an important step to establish the knowledge graph of accident risk factors of the oil&gas storage and transportation enterprises.In order to solve the problem of semantic representation limitations and word segmentation errors for the textual generalization of risk factors accumulated in the production process of oil&gas storage and transportation enterprises by existing event text generalization methods,a textual generalization method of accident risk factors based on the Char-Word feature based AGNES(CW-AGNES)was put forward according to the complicated and changeable text expression of safety management.Definitely,the character feature and binary word feature vectors of the oil&gas storage and transportation enterprises were obtained by Word2vec method.The text of accident risk factors is vectorized according to the pre-trained word vector model.Then,the char-word features of the text are added with the agglomerative nesting method,and the error caused by word segmentation can be reduced on the basis of retaining the semantic information of the words,so as to realize the generalization of the risk factor text.Specifically,the CW-AGNES method was applied to the actual safety management texts of the oil&gas storage and transportation enterprises.Meanwhile,comparison was made with other generalization methods.The results show that:The CW-AGNES method has a better generalization effect with 2.44%–5.74%improvement in quantitative evaluation indicators such as AMI,ARI,V-Measure and FMI.Therefore,the proposed method could provide support for the construction of accident risk knowledge graph in the field of oil&gas storage and transportation.
作者
张曦月
胡瑾秋
张来斌
董绍华
徐康凯
ZHANG Xiyue;HU Jinqiu;ZHANG Laibin;DONG Shaohua;XU Kangkai(College of Safety and Ocean Engineering,China University of Petroleum(Beijing))
出处
《油气储运》
CAS
北大核心
2021年第11期1242-1249,共8页
Oil & Gas Storage and Transportation
基金
国家重点研发计划资助项目“复杂油气智能钻井理论与方法”,2019YFA0708304
国家自然科学基金资助项目“信息安全威胁下油气智慧管道系统失效新型致灾机理与早期预警”,52074323
中国石油大学(北京)科研基金资助项目“海洋油气管道系统安全与完整性关键技术研究”,ZX20200137。
关键词
油气储运企业
事故风险因素
文本泛化
字词特征
知识图谱
安全管理
oil&gas storage and transportation enterprises
accident risk factors
textual generalization
char-word features
knowledge graph
safety management