摘要
随着大数据技术的深入发展,各领域产生了海量异构数据,构建知识图谱是实现异构数据语义互通的重要手段。通过将结构化数据与本体模型映射匹配来生成实例模型是图谱实例层构建常用的方法。然而,对于复杂异构的领域数据来说,现有映射式实例构建方法大多需要用户手动完成全部映射匹配,映射操作繁琐,无法进行智能匹配,费时费力且容易出错。除此之外,现有方法对实例导入后的增量更新也支持不足。针对现有模式匹配和实例构建方法的映射操作繁琐的问题,提出了基于智能映射推荐的实例构建与演化方法。其中,智能映射复用推荐机制,在用户手动映射之前进行数据模式匹配计算,对元素级相似度、表级相似度和表间传播相似度进行多级相似度综合计算,根据数据模式匹配度仲裁排序后生成推荐映射。另外,增量发现机制通过自动发现冗余实例和冲突实例,生成系统后台任务进行处理,可实现实例的高效无重复导入。在山东市政府开放数据集和深圳市医疗急救数据集上进行了实验,在映射复用推荐模块的辅助下,交互时间缩短为传统模式的约26%,字段推荐匹配准确率达到98.1%;在增量发现模块的实验中,导入了1 394万个实例节点以及2 158万条关系边所需的时间由31.21 h缩短至2.23 h,验证了智能映射复用推荐的可用性和匹配准确率,提高了实例层构建与演化的效率。
With the development of big data technology,a large amount of heterogeneous data has been generated in various fields.Constructing knowledge graph is an important means to realize semantic intercommunication of heterogeneous data.It is a common method to generate instance model by matching structured data with ontology model mapping.However,most of the existing construction methods require users to manually complete all mapping matching,and the mapping operation is time-consuming and error-prone,unable to perform intelligent matching.In addition,the existing methods do not support incremental updates of the instances.This paper analyzes the existing instance construction methods,and proposes an instance construction and evolution method based on intelligent mapping recommendation to solve the problem of cumbersome manual mapping.Before manually mapping by users,the mapping reuse recommendation mechanism performs multilevel similarity calculation,including element-level similarity,table-level similarity and inter-table propagation similarity,and generates recommendation mapping according to the sorting result of matching.In addition,the incremental discovery mechanism can automatically discover redundant and conflicting instances and generate system background tasks for processing,so as to realize efficient and repeatless import of instances.Experiments are carried out on Shandong government open dataset and Shenzhen medical emergency dataset.With the help of the mapping reuse recommendation module,the interaction time is 3~4 times shorter than that of the traditional mode,and the matching accuracy of field recommendation reaches 98.1%.In the experiment of incremental discovery mechanism,the time required to import 13.94 million instance nodes and 21.58 million relationship edges is reduced from 31.21h to 2.23h,which proves the availability and matching accuracy of intelligent mapping reuse recommendation,and improves the efficiency of instance layer construction and growth.
作者
张雅晴
单中原
赵俊峰
王亚沙
ZHANG Yaqing;SHAN Zhongyuan;ZHAO Junfeng;WANG Yasha(School of Computer Science,Peking University,Beijing 100871,China;Key Laboratory of High Confidence Software Technologies,Ministry of Education,Beijing 100871,China;Peking University Information Technology Institute(Tianjin Binhai),Tianjin 300450,China)
出处
《计算机科学》
CSCD
北大核心
2023年第6期142-150,共9页
Computer Science
基金
国家自然科学基金(62172011)
中央高校基本科研业务费。
关键词
知识图谱
模式匹配
映射复用
实例构建
图谱演化
Knowledge graph
Schema matching
Mapping reusing
Instance construction
Graph evolution