期刊文献+

FusionDB:基于分布式查询引擎和HDFS的SPARQL查询处理系统 被引量:1

FusionDB:Evaluating SPARQL Queries on Distributed Query Engine and HDFS
下载PDF
导出
摘要 近年来,互联网上的RDF三元组数量增长迅速,传统的单机SPARQL查询处理技术已不能满足实际需要.现有的分布式SPARQL查询处理系统可以分为2类,基于Hadoop的,或是基于数据库集群的.前者主要采用Map?Reduce来处理查询,效率较低;后者则继承了传统数据库集群的缺陷,可扩展性较差.提出一个新颖的SPARQL查询处理系统FusionDB.该系统采用分布式查询处理引擎和HDFS,这样既可以受益于传统的分布式数据库技术,如分布式连接、流水线、负载均衡等,又从新兴的Hadoop技术中得到了良好的容错能力和高可扩展性.为了进一步加速查询处理的效率,FusionDB还在HDFS文件上增加了注入式索引.实验表明,相比于传统的系统,FusionDB在性能上具有明显的优势. Recently,the volume of RDF triples in Internet is growing rapidly.Traditional centralized SPARQL evaluating approaches cannot handle such large-volume RDF data and do not meet the practical requirements.Existing distributed SPARQL processing systems can be categorized into two classes,i.e.Hadoop based and DB cluster based.The efficiency of the Hadoop based approaches is questionable because they evaluate SPARQL queries through a set of Map?Reduce jobs.On the other hand,the second class of approaches inherits the property of low scalability from the DB clusters.This paper proposes a novel system,named FusionDB,which is built on distributed query engine and HDFS.Therefore,FusionDB can benefit from both DB clusters and Hadoop.It can adopt the techniques in DB clusters,such as distributed join,streamline,and workload balancing.It also naturally obtains the ability of high scalability from Hadoop.To improve the query evaluation efficiency,we further build Trojan index over HDFS.As illustrated by our experimental study,the performance of FusionDB defeats the competitors markedly.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第S1期139-142,共4页 Journal of Computer Research and Development
基金 中国人民大学预研委托(团队基金)项目(14XNLQ06) 异构大数据分析挖掘整合技术北京市工程实验室基金项目
关键词 SPARQL 查询重写 分布式查询引擎 HDFS 索引 SPARQL query rewriting distributed query engine HDFS Index
  • 相关文献

参考文献11

  • 1Wikipedia.Linked Open Data. http://de.wikipedia.org/wiki/Linked_Open_Data . 2015
  • 2Gurajada S,Seufert S,Miliarak I,et al.TriAD:A distributed shared-nothing rdf engine based on asynchronous message passing. Proc of ACM SIGMOD 2014 . 2014
  • 3Apache Impala. http://impala.io/overview.html . 2015
  • 4Facebook Presto. https://prestodb.io/overview.html . 2015
  • 5Wu Buwen,Zhou Yongluan,Yuan Pingpeng,et al.Scalable sparql querying using path partitioning. Proc of Int Conf on Data Engineering . 2015
  • 6杜方,陈跃国,杜小勇.RDF数据查询处理技术综述[J].软件学报,2013,24(6):1222-1242. 被引量:64
  • 7Huang J,Abadi D,Ren K.Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment . 2011
  • 8Mohammad Farhan Husain,James McGlothlin,Mohammad Mehedy Masud.Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing. IEEE Transactions on Knowledge and Data Engineering . 2011
  • 9Jens Dittrich,Jorge-Arnulfo Quiané-Ruiz,Alekh Jindal,Yagiz Kargin,Vinay Setty,J?rg Schad.Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment . 2010
  • 10Bin Shao,Haixun Wang,Yatao Li.Trinity:A Distributed Graph Engine on a Memory Cloud. ACM SIGCOMM . 2013

二级参考文献3

共引文献65

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部