摘要
【目的】解决传统三元组式知识图谱表示方法面对科技文献数据变动导致的数据一致性问题,构建满足科技信息服务需求的大规模科研知识图谱。【方法】提出隐式知识图谱构建方法,通过实体特征字段和隐式关系概念配合实体特征字段的识别工具和隐式关系的发现工具,实现对实体的持续更新和对实体关系的自动关联发现。【结果】本文方法已在PB级科技文献大数据平台实践应用,处理由于实体数据变化引起的修改时,隐式知识图谱只需更改实体数据而无需对关系数据进行修改;检索性能优越,通过预定义的接口检索机构所有学者的平均耗时缩减至三元组式知识图谱的百分之一。【局限】对于不符合隐式关系数据结构的情况很难固化,实体数据必须存储在具有搜索引擎的技术集群中。【结论】本文提出的隐式知识图谱构建方法很好地解决了由于实体信息变动引发的数据一致性问题,适用于大规模科研知识图谱的构建,有助于科技知识的高效管理和传播利用。
[Objective] This paper builds a large-scale knowledge graph for scientific research, which meets the needs of sci-tech information services and improves the data consistency of traditional models. [Methods] First,we proposed an implicit knowledge graph construction method. Then, we used the identification tools for entity feature fields and implicit relationships to continuously update entities and discover entity relationship. [Results]We examined the proposed model with big data platform for PB-level sci-tech literature. Once there are changes in the entity data, the implicit knowledge graph will only update the entity data and will not modify their relationship. The model could retrieve all scholars from one institution through the predefined interface, and the average processing time was one hundredth of the triple-type knowledge graph. [Limitations] It is difficult to solidify the situation not satisfying the implicit relational data structure, and the entity data must be stored in a technical cluster with search engine. [Conclusions] The proposed method could effectively improve the data consistency issue due to changes in entity information. It helps us construct large-scale scientific research knowledge graph, which benefits the management, dissemination and utilization of sci-tech knowledge.
作者
杜悦
常志军
董美
钱力
王颖
Du Yue;Chang Zhijun;Dong Mei;Qian Li;Wang Ying(National Science Library,Chinese Academy of Sciences,Beijing 100190,China;Department of Information Resources Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2023年第2期141-150,共10页
Data Analysis and Knowledge Discovery
基金
中国科学院文献情报能力建设项目(项目编号:Y9100901)的研究成果之一。
关键词
知识图谱
数据一致性
科技大数据
Knowledge Graph
Data Consistency
Sci-Tech Big Data