基于索引的分布式RDF查询优化算法被引量：1

Distributed Optimized Query Algorithm Based on Index

下载PDF

导出

摘要在Hadoop平台中采用索引文件来辅助查询是解决海量RDF(Resource Description Framework)查询的一种新思路。目前在Hadoop平台中实现的RDF查询都较少利用索引文件,且主要针对RDF的静态数据,对数据动态更新操作的兼容性都比较差。为了克服这两个缺点,提出IMSQ(using Index in MapReduce to Segment and Query)算法来对RDF文件进行分布式查询。该算法主要分为分割和查询两部分:首先为RDF进行一次星形分割,得到若干个分割,文件并建立索引文件;其次在查询时,按照分层生成连接计划,采用过滤选择策略,先找索引文件,缩小文件集,再对相应的分割文件进行查询;最后进行一次结果合并和输出。在LUBM数据集上进行的测试实验表明,在数据量大的情况下IMSQ方法的查询效率具有明显的优势。 Using index file is a new way of solving the large amount of RDF（Resource Description Framework）query problem,which can be a great aid to query optimization.At present,most of the RDF query optimization method based on Hadoop do not use index file,and most of them aim at static data so they perform poorly at dynamic updating of data.In order to overcome these two drawbacks,this paper proposed IMSQ（using Index in MapReduce to Segment and Query）algorithm to perform distributed RDF query.The algorithm can be divided into segment and query execution two parts,firstly,makes a starlike segmentation for RDF data,and obtaines several segment file and corresponding index file,secondly,generates a layered join plan,uses filter method to seek the index file to narrow the result set and then does query on corresponding segment file;finally,merges and outputs the middle result.The results of the experiment on the LUBM test data set show that IMSQ method query efficiency is higher when the amount of the RDF data is large.

作者汪璟玢方知立

机构地区福州大学数学与计算机科学学院

出处《计算机科学》 CSCD 北大核心 2014年第11期233-238,共6页 Computer Science

基金福州大学科技发展基金项目(2013-XQ-32) 空间数据挖掘与信息共享教育部重点实验室开放研究基金项目(201006) 2011年福建省科技拥军基金项目(JG2011005) 福建省自然科学基金项目(2012J01168)资助

关键词 HADOOP RDF 索引 MAPREDUCE Hadoop RDF Index MapReduce

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1李慧颖,瞿裕忠.基于关键词的语义网数据查询研究综述[J].计算机科学,2011,38(7):18-23. 被引量：7
2王鑫,冯志勇,杜朴风,饶国政,姜洋,李旭,生兆花,姜龙翔.Jingwei:一种分布式大规模RDF数据服务器[J].计算机研究与发展,2011,48(S3):451-455. 被引量：4
3刘翔宇,吴刚.基于Prüfer序列的RDF数据索引与查询[J].计算机学报,2011,34(10):1997-2008. 被引量：4
4袁平鹏,刘谱,张文娅,吴步文.高可扩展的RDF数据存储系统[J].计算机研究与发展,2012,49(10):2131-2141. 被引量：9
5Li Li Yaqi Song.Distributed Storage of Massive RDF Data Using HBase[J].通讯和计算机（中英文版）,2011,8(5):325-328. 被引量：3

二级参考文献96

1孔令波,唐世渭,杨冬青,王腾蛟,高军.XML数据的查询技术[J].软件学报,2007,18(6):1400-1418. 被引量：72
2N.A. Hunter, J. Li, Y.F. Bouton, M.C. Davis, A scale-out RDF molecule store for distributed processing of biomedical data semantic web for health care and life sciences, in: Workshop WWW 2008, Beijing, China, 2008.
3Available online at: http://hadoop.apache.org/.
4D. J. Abadi, S. R. Madden, N. Hachem, Column-stores vs.row-stores: how different are they really, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of data, 2008, p. 967.
5M.F. Husain, P. Doshi, L. Khan, B. Thuraisingham, Storage and retrieval of large RDF graph using Hadoop and MapReduce, in: Proceedings of the 1st International Conference on Cloud Computing, 2009, pp. 680-686.
6M. Cai, M. Frank, Rdfpeers: a scalable distributed RDF repository based on a structured peer-to-peer network, in: Proceedings of the 13th International Conference on World Wide Web, 2004, pp. 650-657.
7Berners-Lee T, Fischetti M, Dertouzos M L. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web. Harper, San Francisco, 1999.
8Rao P, Moon B. PRIX: Indexing and querying XML using prfer sequences//Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE2007). Boston, MA, United States, 2004:288- 299.
9Broekstra J, Kampman A, van Harmelen F. Sesame: An architecture for storing and querying RDF data and schema in formation//Proceedings of the Spinning the Semantic Web. Cambridge, MA.. MIT Press, 2003:197-222.
10Wilkinson K, Sayers C, Kuno H A, Reynolds D. Efficient RDF storage and retrieval in jena2//Proeeedings of the SWDB' 03, The first International Workshop on Semantic Web and Databases, Co-Located with VLDB 2003, 2003.. 131-150.

共引文献19

1鲁富宇,冷泳林.RDF图模型支持下的知识图谱数据索引与压缩存储算法[J].渤海大学学报（自然科学版）,2021,42(3):277-282.
2胡哲,郑诚.一种改进的基于领域本体的概念语义相似度算法[J].齐齐哈尔大学学报（自然科学版）,2013,29(1):1-6. 被引量：1
3李琪,吴刚.语义传感器Web中的数据管理技术研究[J].计算机科学,2013,40(6):1-7. 被引量：1
4李斌.大数据及其发展趋势研究[J].广西教育,2013(35):190-192. 被引量：10
5许德山,张运良.集成化本体管理平台的设计与实现[J].数字图书馆论坛,2013(11):15-20. 被引量：3
6李小亮,丁晓明,尹然,梅莹.基于RDF图的测试用例生成[J].西南大学学报（自然科学版）,2014,36(1):146-151. 被引量：2
7汪璟玢,方知立,张燕琴.面向分布式的SPARQL查询优化算法[J].计算机科学,2014,41(7):227-231. 被引量：3
8余本功,顾佳伟.基于Folksonomy和RDF的信息组织与表示[J].现代图书情报技术,2014(11):24-30. 被引量：4
9张岩,李军,王军,张杰,李永革.部队健康管理信息支撑平台构建与应用[J].解放军医院管理杂志,2014,21(9):849-851. 被引量：3
10魏亚洲,王鑫,冯志勇,饶国政.S-Index:一种面向大规模RDF数据的高效率语义索引方案[J].武汉大学学报（理学版）,2015,61(2):131-138. 被引量：2

同被引文献1

1杜方,陈跃国,杜小勇.RDF数据查询处理技术综述[J].软件学报,2013,24(6):1222-1242. 被引量：64

引证文献1

1Minru Guo,Jingbin Wang.A Distributed Query Method for RDF Data on Spark[J].国际计算机前沿大会会议论文集,2015(B12):28-30.

1汪璟玢,方知立,张燕琴.面向分布式的SPARQL查询优化算法[J].计算机科学,2014,41(7):227-231. 被引量：3
2佟强,程经纬,张富,张丽丽,马宗民.基于查询转换的RDF高效查询方法[J].吉林大学学报（工学版）,2015,45(5):1550-1558. 被引量：7
3陈哲,魏衍君.基于本体的XML数据源语义集成研究[J].郑州大学学报（理学版）,2006,38(2):36-39. 被引量：3
4崔义童,冯志勇,王鑫,饶国政.基于图聚类算法的大规模RDF数据查询方法研究[J].小型微型计算机系统,2015,36(12):2625-2628. 被引量：6
5周湘超,詹磊,吴庆,陈义明.专家知识图谱构建研究[J].电脑知识与技术,2016,0(3):195-197. 被引量：1
6田庆立,李爱民,方宗德.应用RDF本体图扩充SPARQL查询[J].情报杂志,2006,25(1):126-128. 被引量：2
7简讯[J].新电脑,2009,33(2):158-158.
8何雪梅,陈晓红.基于Joseki的RDF查询试验系统设计[J].中国科技博览,2010(24):220-221.
9韩亚洪,许卓明.怎样在关系数据库中存储RDF[J].计算机与现代化,2002(11):70-72. 被引量：3
10祁宇,黄小文,沈顺成.基于WEB的MES中设备管理的研究与实现[J].机械,2007,34(10):40-42. 被引量：4

计算机科学

2014年第11期

浏览历史

内容加载中请稍等...

基于索引的分布式RDF查询优化算法被引量：1

参考文献5

二级参考文献96

共引文献19

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于索引的分布式RDF查询优化算法 被引量：1

参考文献5

二级参考文献96

共引文献19

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于索引的分布式RDF查询优化算法被引量：1