摘要
在Hadoop平台中采用索引文件来辅助查询是解决海量RDF(Resource Description Framework)查询的一种新思路。目前在Hadoop平台中实现的RDF查询都较少利用索引文件,且主要针对RDF的静态数据,对数据动态更新操作的兼容性都比较差。为了克服这两个缺点,提出IMSQ(using Index in MapReduce to Segment and Query)算法来对RDF文件进行分布式查询。该算法主要分为分割和查询两部分:首先为RDF进行一次星形分割,得到若干个分割,文件并建立索引文件;其次在查询时,按照分层生成连接计划,采用过滤选择策略,先找索引文件,缩小文件集,再对相应的分割文件进行查询;最后进行一次结果合并和输出。在LUBM数据集上进行的测试实验表明,在数据量大的情况下IMSQ方法的查询效率具有明显的优势。
Using index file is a new way of solving the large amount of RDF(Resource Description Framework)query problem,which can be a great aid to query optimization.At present,most of the RDF query optimization method based on Hadoop do not use index file,and most of them aim at static data so they perform poorly at dynamic updating of data.In order to overcome these two drawbacks,this paper proposed IMSQ(using Index in MapReduce to Segment and Query)algorithm to perform distributed RDF query.The algorithm can be divided into segment and query execution two parts,firstly,makes a starlike segmentation for RDF data,and obtaines several segment file and corresponding index file,secondly,generates a layered join plan,uses filter method to seek the index file to narrow the result set and then does query on corresponding segment file;finally,merges and outputs the middle result.The results of the experiment on the LUBM test data set show that IMSQ method query efficiency is higher when the amount of the RDF data is large.
出处
《计算机科学》
CSCD
北大核心
2014年第11期233-238,共6页
Computer Science
基金
福州大学科技发展基金项目(2013-XQ-32)
空间数据挖掘与信息共享教育部重点实验室开放研究基金项目(201006)
2011年福建省科技拥军基金项目(JG2011005)
福建省自然科学基金项目(2012J01168)资助