云环境下海量语义数据的查询策略

Massive semantic data query method based on cloud computing

下载PDF

导出

摘要为了实现对海量RDF数据的高效查询,研究RDF数据在分布式数据库HBase中的存储方法。基于MapReduce设计海量RDF数据的两阶段查询策略,将查询分为SPARQL预处理阶段与分布式查询执行阶段。SPARQL预处理阶段设计实现基于SPARQL变量关联度的查询划分算法JOVR,通过计算SPARQL查询语句中变量的关联度确定连接变量的连接顺序,根据连接变量将SPARQL子句连接操作划分到最小数量的MapReduce任务中;分布式查询执行阶段执行SPARQL预处理阶段划分的MapReduce任务,实现对海量RDF数据的并行查询。采用LUBM标准测试数据集对查询策略予以验证。研究结果表明:JOVR算法能够高效地实现对海量RDF数据的查询,并具有较强的稳定性与可扩展性。 In order to achieve the efficient query for large-scale RDF data, the storage method of RDF triples in HBase was analyzed and a two-phase query strategy for large-scale RDF data was designed based on MapReduce, which was divided into two stages, i.e. the SPARQL pretreatment stage and the distributed query execution stage. In the SPARQL pretreatment stage, a SPARQL query classification algorithm-JOVR was implemented, which determined the join order of connection variables by calculating the correlation between the variables in a SPARQL query statement, and then the join between SPARQL clauses was divided into the minimum number of MapReduce jobs according to the connection variables. The distributed query execution phase accomplished large-scale RDF data query concurrently based on MapRdecue jobs from SPARQL pretreatment stage. The strategy was verified by LUMB benchmark set. The results show that JOVR can query large-scale RDF data efficiently with strong stability and scalability.

作者胡志刚景冬梅陈柏林郑美光杨柳

机构地区中南大学软件学院

出处《中南大学学报（自然科学版）》 EI CAS CSCD 北大核心 2017年第5期1218-1226,共9页 Journal of Central South University:Science and Technology

基金国家自然科学基金资助项目(61301136 61572525 61602525)~~

关键词并行处理语义信息查询策略 MAPREDUCE SPARQL 海量RDF parallel processing semantic information query strategy MapReduee SPARQL large-scale RDF

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169. 被引量：2393
2王珊,王会举,覃雄派,周烜.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752. 被引量：616
3杜小勇,王琰,吕彬.语义Web数据管理研究进展[J].软件学报,2009,20(11):2950-2964. 被引量：16

二级参考文献210

1[OL].<http://hadoop.apache.org.>.
2WinterCorp: 2005 TopTen Program Summary. http:// www. wintercorp, com/WhitePapers/WC TopTenWP. pdf.
3TDWI Checklist Report: Big Data Analytics. http://tdwi. org/research/2010/08/Big-Data-Analytics, aspx.
4Chaudhuri S, Dayal U. An overview of data warehousing and OLAP technology. SIGMOD Rec, 1997,26(1): 65-74.
5Madden S, DeWitt D J, Stonebraker M. Database parallelism choices greatly impact scalability. DatabaseColumn Blog. http://www, databasecolumn, com/2007/10/database-parallelism-choices, html.
6Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters//Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI ' 04). San Francisco, California, USA, 2004: 137-150.
7DeWitt D J, Gerber R H, Graefe G, Heytens M L, Kumar K B, Muralikrishna M. GAMMA--A high performance dataflow database machine//Proceedings of the 12th International Conference on Very Large Data Bases (VLDB' 86). Kyoto, Japan, 1986:228-237.
8Fushimi S, Kitsuregawa M, Tanaka H. An overview of the system software of a parallel relational database machine// Proceedings of the 12th International Conference on Very Large DataBases(VLDB'86). Kyoto, Japan, 1986:209-219.
9Brewer E A. Towards robust distributed systems//Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC' 00). Portland, Oregon, USA, 2000:7.
10http: //www. dbms2, com/2008/08/26/known-applications of mapreduce/.

共引文献2825

1韩莹莹,钟专,褚月娇,康春阳,李东霓,王志佳,刘晓阳,张白羽.基于大数据智能化背景下神经病学实践教学体系构建的探索[J].中国实验诊断学,2023,27(8):1006-1009.
2李坪.大数据赋权正当性证成[J].中山大学法律评论,2020(1):3-21. 被引量：1
3孙昊鹏.大数据在新冠肺炎疫情中的应用和缺失[J].郑州师范教育,2020,9(3):91-96. 被引量：1
4张刘玲.会展行业发展现状及未来发展趋势[J].质量与市场,2023(12):31-33. 被引量：2
5闫妍.刍议大数据时代背景下全面预算管理对提升项目储备精益化管理水平的价值[J].质量与市场,2020,0(1):19-21. 被引量：7
6李明建.试论大数据技术的图书馆特色馆藏文化建设[J].作家天地,2020(21):189-190.
7叶青.违法立案的检察监督机制研究[J].国家检察官学院学报,2024,32(1):53-68. 被引量：2
8刘厚营.大数据在安保工作情报分析中的应用[J].工程技术研究,2018,3(1):243-244. 被引量：1
9肖楠,陈红梅.从融媒体到智媒体:一种技术驱动下的传媒经济发展路径[J].新闻知识,2020(9):19-22. 被引量：3
10杨东,郑清洋.从TikTok事件看数字人民币的路径选择:从流量入口到金融优势的转化[J].新疆师范大学学报（哲学社会科学版）,2021,42(4):126-135. 被引量：6

1孙丽丽.基于多传感器的语义数据融合方法研究[J].计算机光盘软件与应用,2014,17(20):45-46. 被引量：1
2肖佳,肖诗斌,王洪俊.海量RDF数据存储查询研究[J].北京信息科技大学学报（自然科学版）,2017,32(3):63-69. 被引量：2
3刘思彤,罗军,詹火木.OWL文件向关系数据库的映射及存储研究[J].计算机与现代化,2017(5):83-87.
4陈磊.基于RIF的链接数据访问控制机制研究[J].阜阳师范学院学报（自然科学版）,2017,34(2):69-73.
5徐爱萍,王波,张煦.基于HBASE的时空大数据关联查询优化[J].计算机应用与软件,2017,34(6):37-42. 被引量：1

中南大学学报（自然科学版）

2017年第5期

浏览历史

内容加载中请稍等...

云环境下海量语义数据的查询策略

参考文献3

二级参考文献210

共引文献2825

相关作者

相关机构

相关主题

浏览历史