摘要
绿色标准是规范企业用能、推动工业绿色发展的重要依据,绿色标准应用或执行越来越被重视。目前绿色标准信息检索主要依据关键词,标准中所隐含的关联和知识信息则无法进行有效检索,制约了绿色标准应用推广。本文以绿色标准为对象,分析标准文件的结构特点并结合非结构化文本内容主题特征,构建了基于内容主题的绿色标准数据抽取模型,对构建的标准实体进行融合处理,完成了面向绿色标准领域的知识图谱知识库构建。通过对700余份标准文件进行本体构建、数据抽取、数据融合和数据存储后,结果证明了面向绿色标准的知识图谱构建方法有效,为下一步面向绿色标准的知识问答系统开发奠定了数据基础。
Green standards are an important basis for regulating the use of energy by enterprises and promoting the green development of industries,so the application of green standards is being increasingly valued.At present,the retrieval of green standard information is mainly based on keywords.The association and knowledge information hidden in the standard cannot be retrieved effectively,which restricts the application and promotion of green standards.This article takes green standards as the object,analyzes the structural characteristics of the standard file and combines the unstructured text content theme characteristics,builds a green standard data extraction model based on the content theme,fuses the constructed standard entities,and completes the knowledge graph for the green standards domain knowledge base construction.After constructing ontology,data extraction,data fusion and data storage for more than 700 standard files,the results prove that the method of constructing a knowledge graph for green standards is effective,laying a data foundation for the next development of a knowledge standard question answering system for green standards.
作者
张鹏飞
袁志祥
鲍威
洪旭东
ZHANG Peng-fei;YUAN Zhi-xiang;BAO Wei;HONG Xu-dong(Anhui University of Technology;China National Institute of Standardization)
出处
《标准科学》
2020年第6期68-73,共6页
Standard Science
基金
国家重点研发计划项目(项目编号:2016YFF020440508)资助。
关键词
绿色标准
内容主题特征
数据抽取
知识图谱
green standard
content theme characteristics
data extraction
knowledge graph