摘要
目前,各个国家和地区均已将大数据视为重要的战略资源.然而,大数据时代普遍存在数据流通困难、数据监管不足等问题,致使数据孤岛现象严重,数据质量低下,数据要素潜能难以释放.这驱使研究人员探索数据集成技术,以打破数据壁垒、实现信息共享、提升数据质量,进而激活数据要素潜能.关系型数据和知识图谱作为两种至关重要的数据组织与存储形式,在现实生活中应用广泛.为此,聚焦关系型数据和知识图谱,归纳总结并分析实体解析、数据融合、数据清洗3方面的数据集成关键技术,最后展望未来研究方向与趋势.
Recently,big data is considered a critical strategic resource by many countries and regions.However,difficult data circulation and insufficient data regulation commonly exist in the big data era,thereby leading to the serious phenomenon of data silos,poor data quality,and difficulty in unleashing the potential of data elements.This provokes researchers to explore data integration techniques for breaking data barriers,enabling data sharing,improving data quality,and activating the potential of data elements.Relational data and knowledge graphs,as two significant forms of data organization and storage,have been widely applied in real life.To this end,this study focuses on relational data and knowledge graphs to summarize and analyze the key technologies of data integration,including entity resolution,data fusion,and data cleaning.Finally,it prospects future research directions.
作者
高云君
葛丛丛
郭宇翔
陈璐
GAO Yun-Jun;GE Cong-Cong;GUO Yu-Xiang;CHEN Lu(College of Computer Science and Technology,Zhejiang University,Hangzhou 310027,China;Data Intelligence Innovation Lab,Huawei Cloud Computing Technologies Co.Ltd.,Hangzhou 310052,China)
出处
《软件学报》
EI
CSCD
北大核心
2023年第5期2365-2391,共27页
Journal of Software
基金
国家重点研发计划(2021YFC3300300,2021YFC3300303)
国家自然科学基金(62025206,61972338,62102351)。
关键词
关系型数据
知识图谱
数据集成
relational data
knowledge graph(KG)
data integration