期刊文献+

基于索引的分布式RDF查询优化算法 被引量:1

Distributed Optimized Query Algorithm Based on Index
下载PDF
导出
摘要 在Hadoop平台中采用索引文件来辅助查询是解决海量RDF(Resource Description Framework)查询的一种新思路。目前在Hadoop平台中实现的RDF查询都较少利用索引文件,且主要针对RDF的静态数据,对数据动态更新操作的兼容性都比较差。为了克服这两个缺点,提出IMSQ(using Index in MapReduce to Segment and Query)算法来对RDF文件进行分布式查询。该算法主要分为分割和查询两部分:首先为RDF进行一次星形分割,得到若干个分割,文件并建立索引文件;其次在查询时,按照分层生成连接计划,采用过滤选择策略,先找索引文件,缩小文件集,再对相应的分割文件进行查询;最后进行一次结果合并和输出。在LUBM数据集上进行的测试实验表明,在数据量大的情况下IMSQ方法的查询效率具有明显的优势。 Using index file is a new way of solving the large amount of RDF(Resource Description Framework)query problem,which can be a great aid to query optimization.At present,most of the RDF query optimization method based on Hadoop do not use index file,and most of them aim at static data so they perform poorly at dynamic updating of data.In order to overcome these two drawbacks,this paper proposed IMSQ(using Index in MapReduce to Segment and Query)algorithm to perform distributed RDF query.The algorithm can be divided into segment and query execution two parts,firstly,makes a starlike segmentation for RDF data,and obtaines several segment file and corresponding index file,secondly,generates a layered join plan,uses filter method to seek the index file to narrow the result set and then does query on corresponding segment file;finally,merges and outputs the middle result.The results of the experiment on the LUBM test data set show that IMSQ method query efficiency is higher when the amount of the RDF data is large.
出处 《计算机科学》 CSCD 北大核心 2014年第11期233-238,共6页 Computer Science
基金 福州大学科技发展基金项目(2013-XQ-32) 空间数据挖掘与信息共享教育部重点实验室开放研究基金项目(201006) 2011年福建省科技拥军基金项目(JG2011005) 福建省自然科学基金项目(2012J01168)资助
关键词 HADOOP RDF 索引 MAPREDUCE Hadoop RDF Index MapReduce
  • 相关文献

参考文献5

二级参考文献96

  • 1孔令波,唐世渭,杨冬青,王腾蛟,高军.XML数据的查询技术[J].软件学报,2007,18(6):1400-1418. 被引量:72
  • 2N.A. Hunter, J. Li, Y.F. Bouton, M.C. Davis, A scale-out RDF molecule store for distributed processing of biomedical data semantic web for health care and life sciences, in: Workshop WWW 2008, Beijing, China, 2008.
  • 3Available online at: http://hadoop.apache.org/.
  • 4D. J. Abadi, S. R. Madden, N. Hachem, Column-stores vs.row-stores: how different are they really, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of data, 2008, p. 967.
  • 5M.F. Husain, P. Doshi, L. Khan, B. Thuraisingham, Storage and retrieval of large RDF graph using Hadoop and MapReduce, in: Proceedings of the 1st International Conference on Cloud Computing, 2009, pp. 680-686.
  • 6M. Cai, M. Frank, Rdfpeers: a scalable distributed RDF repository based on a structured peer-to-peer network, in: Proceedings of the 13th International Conference on World Wide Web, 2004, pp. 650-657.
  • 7Berners-Lee T, Fischetti M, Dertouzos M L. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web. Harper, San Francisco, 1999.
  • 8Rao P, Moon B. PRIX: Indexing and querying XML using prfer sequences//Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE2007). Boston, MA, United States, 2004:288- 299.
  • 9Broekstra J, Kampman A, van Harmelen F. Sesame: An architecture for storing and querying RDF data and schema in formation//Proceedings of the Spinning the Semantic Web. Cambridge, MA.. MIT Press, 2003:197-222.
  • 10Wilkinson K, Sayers C, Kuno H A, Reynolds D. Efficient RDF storage and retrieval in jena2//Proeeedings of the SWDB' 03, The first International Workshop on Semantic Web and Databases, Co-Located with VLDB 2003, 2003.. 131-150.

共引文献19

同被引文献1

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部