期刊文献+

云环境下海量语义数据的查询策略

Massive semantic data query method based on cloud computing
下载PDF
导出
摘要 为了实现对海量RDF数据的高效查询,研究RDF数据在分布式数据库HBase中的存储方法。基于MapReduce设计海量RDF数据的两阶段查询策略,将查询分为SPARQL预处理阶段与分布式查询执行阶段。SPARQL预处理阶段设计实现基于SPARQL变量关联度的查询划分算法JOVR,通过计算SPARQL查询语句中变量的关联度确定连接变量的连接顺序,根据连接变量将SPARQL子句连接操作划分到最小数量的MapReduce任务中;分布式查询执行阶段执行SPARQL预处理阶段划分的MapReduce任务,实现对海量RDF数据的并行查询。采用LUBM标准测试数据集对查询策略予以验证。研究结果表明:JOVR算法能够高效地实现对海量RDF数据的查询,并具有较强的稳定性与可扩展性。 In order to achieve the efficient query for large-scale RDF data, the storage method of RDF triples in HBase was analyzed and a two-phase query strategy for large-scale RDF data was designed based on MapReduce, which was divided into two stages, i.e. the SPARQL pretreatment stage and the distributed query execution stage. In the SPARQL pretreatment stage, a SPARQL query classification algorithm-JOVR was implemented, which determined the join order of connection variables by calculating the correlation between the variables in a SPARQL query statement, and then the join between SPARQL clauses was divided into the minimum number of MapReduce jobs according to the connection variables. The distributed query execution phase accomplished large-scale RDF data query concurrently based on MapRdecue jobs from SPARQL pretreatment stage. The strategy was verified by LUMB benchmark set. The results show that JOVR can query large-scale RDF data efficiently with strong stability and scalability.
出处 《中南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2017年第5期1218-1226,共9页 Journal of Central South University:Science and Technology
基金 国家自然科学基金资助项目(61301136 61572525 61602525)~~
关键词 并行处理 语义信息查询策略 MAPREDUCE SPARQL 海量RDF parallel processing semantic information query strategy MapReduee SPARQL large-scale RDF
  • 相关文献

参考文献3

二级参考文献210

  • 1[OL].<http://hadoop.apache.org.>.
  • 2WinterCorp: 2005 TopTen Program Summary. http:// www. wintercorp, com/WhitePapers/WC TopTenWP. pdf.
  • 3TDWI Checklist Report: Big Data Analytics. http://tdwi. org/research/2010/08/Big-Data-Analytics, aspx.
  • 4Chaudhuri S, Dayal U. An overview of data warehousing and OLAP technology. SIGMOD Rec, 1997,26(1): 65-74.
  • 5Madden S, DeWitt D J, Stonebraker M. Database parallelism choices greatly impact scalability. DatabaseColumn Blog. http://www, databasecolumn, com/2007/10/database-parallelism-choices, html.
  • 6Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters//Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI ' 04). San Francisco, California, USA, 2004: 137-150.
  • 7DeWitt D J, Gerber R H, Graefe G, Heytens M L, Kumar K B, Muralikrishna M. GAMMA--A high performance dataflow database machine//Proceedings of the 12th International Conference on Very Large Data Bases (VLDB' 86). Kyoto, Japan, 1986:228-237.
  • 8Fushimi S, Kitsuregawa M, Tanaka H. An overview of the system software of a parallel relational database machine// Proceedings of the 12th International Conference on Very Large DataBases(VLDB'86). Kyoto, Japan, 1986:209-219.
  • 9Brewer E A. Towards robust distributed systems//Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC' 00). Portland, Oregon, USA, 2000:7.
  • 10http: //www. dbms2, com/2008/08/26/known-applications of mapreduce/.

共引文献2825

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部