期刊文献+

基于子模式的关系数据到图数据ETL方法研究 被引量:4

Research on ETL method of transforming relational data to graph data based on sub-schema
下载PDF
导出
摘要 图数据库在解决多层关系查询、社区发现等问题时性能优于关系数据库。然而目前大量的数据以关系数据的形式存储,如何高效完整地进行关系数据到图数据的ETL,即抽取、转换、加载,是图数据库应用领域研究的重要问题。国内外对该问题有了一些研究,但存在转换后的图数据质量不高、转换效率低、转换结果不利于分布式存储等问题。因此,提出基于子模式的关系数据到图数据ETL方法,改进原有ETL方法的流程和算法。该方法将关系数据库模式拆分为若干个子模式,并行进行ETL。不仅提高了ETL的效率,转换结果能满足图数据的分布式存储要求,也可以作为Spark GraphX计算框架的基础数据。最后,使用Java EE和Neo4j开发了原型系统,并进行了实验验证。结果表明,改进后的ETL方法获得了较已有方法更好的转化性能。 For addressing problems such as multi-layer relational query and community detection,graph database outperformsrelational database.However,most data of existing applications have stored in the form of relationship.Therefore,how to extract-transform-load(ETL)relational data to graph data efficiently and absolutely is still an important problemof deploying graph database applications.Existing researches suffer from three major limitations:(1)The quality of convertedgraph data are poor;(2)the efficiency of transforming is low;(3)the transformed results are not suitable for distributedstorage.To overcome these limitations,a sub-schema-based ETL method for transforming relational data to graphdata is proposed in this paper.By splitting schema of relational database to several sub-schemas,this method improves thealgorithm and procedure of previous ETLs and provides an efficient way for parallel ETL.The transformed results can satisfythe requirements of distributed storage,and conduct to be the basis data for Spark GraphX computing framework.Finally,Java EE and Neo4j are applied to implement the prototype system for experimental verification.The comparative resultsshow that the improved ETL method yields better performance than previous methods.
作者 丁强龙 王津 张学杰 DING Qianglong;WANG Jin;ZHANG Xuejie(School of Information Science and Engineering, Yunnan University, Kunming 650091, China)
出处 《计算机工程与应用》 CSCD 北大核心 2017年第12期76-84,共9页 Computer Engineering and Applications
基金 国家自然科学基金(No.61170222)
关键词 图数据库 分布式存储 ETL(数据提取、转换和加载) 子模式 graph database distributed storage extract-transform-load(ETL) sub-schema
  • 相关文献

参考文献1

二级参考文献11

共引文献738

同被引文献43

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部