期刊文献+

基于Map-Reduce的海量数据高效Skyline查询处理 被引量:44

Efficient Skyline Query Processing of Massive Data Based on Map-Reduce
下载PDF
导出
摘要 Skyline查询已成为现今数据库和信息检索领域的研究热点之一,伴随着人类可以采集和利用的数据信息的急剧增长,使得如何处理海量数据的Skyline查询成为急需解决的问题.近年来兴起的Map-Reduce编程框架能够有效地处理基于海量数据的应用,该文既是研究如何运用Map-Reduce编程框架解决海量数据的Skyline查询问题.在Map-Reduce框架下处理Skyline查询的直接方法是扫描整个数据集进而得到查询结果,但是在海量数据Skyline查询问题中,查询结果的数量远小于原始数据集的数据量,对此该文提出了一系列的Skyline查询算法及优化,有效地过滤掉部分不能成为Skyline查询结果的数据对象,大幅度提高了在Map-Reduce框架下处理Skyline查询的效率.大量运行在Hadoop平台上的实验验证了该文所提出的Skyline查询处理算法具有良好的有效性、准确性和可用性. Recently,Skyline query has been a research hot of Database and Information Retrieval.In addition,the amount of data for collecting and using by human is developing at an astonishing speed.Therefore,how to process Skyline query of massive data is an urgent problem.Map-Reduce is a new parallel programming model that processes vast number of data on large clusters with easy deployment.As a parallel programming model,Map-Reduce is suit for solving Skyline query of massive data.This paper resolves the problem of processing Skyline query of massive data on Map-Reduce framework.A straightforward implementation of Skyline query on Map-Reduce needs to scan all the candidate results before obtaining the final results.However,when the amount of final results is much smaller than the original data,there is a waste of processing unnecessary results on Map-Reduce framework.Consequently,in this paper,a series of efficient Skyline query algorithms and optimization have been proposed to prune the unpromising results effectively and enhance the performance of processing Skyline query of massive data on Map-Reduce.Our extensive experiments are built on top of Hadoop platform,an open-source implementation of Map-Reduce framework.The experiment results demonstrate that our algorithms have high efficiency,accuracy and scalability.
出处 《计算机学报》 EI CSCD 北大核心 2011年第10期1785-1796,共12页 Chinese Journal of Computers
基金 国家自然科学基金重点项目(60933001) 国家杰出青年科学基金(61025007) 中央高校基本科研业务费专项基金(N090304007)资助~~
关键词 云计算 SKYLINE查询 MAP-REDUCE 海量数据 HADOOP cloud computing skyline query Map-Reduce massive data hadoop
  • 相关文献

参考文献14

  • 1Borzsonyi S, Kossmann D, Stocker K. The Skyline operator//Proceedings of the ICDE. Washington, DC, USA, 2001:421-430.
  • 2魏小娟,杨婧,李翠平,陈红.Skyline查询处理[J].软件学报,2008,19(6):1386-1400. 被引量:35
  • 3Dean J, Ghemawat S. MapReduce: Simplified data processing on large cluster. Communications of the ACM, 2005, 51 (1) :107- 113.
  • 4Tan K L, Eng P K, Ooi B C. Efficient progressive Skyline computation//Proceedings of the VLDB. Roma, Italy, 2001: 301-310.
  • 5Kossmann D, Ramsak F, Rost S. Shooting stars in the sky: An online algorithm for Skyline queries//Proceedings of the VLDB. Hong Kong, China, 2002:275-286.
  • 6周红福,宫学庆,郑凯,周傲英.基于高维空间的在线高效子空间Skyline算法——CSky[J].计算机学报,2007,30(8):1409-1417. 被引量:8
  • 7Wolf-Tilo Balke, Ulrich Giintzer, Jason Xin Zheng. Efficient distributed Skylining for web information systems//Proceed ings of the EDBT. Heraklion, Crete, Greece, 2004: 256-273.
  • 8Wu Ping, Zhang Cai-Jie, Feng Ying et al. Parallelizing skyline queries for scalable distribution//Proceedings of the ED BT. Munich, Germany, 2006: 112-130.
  • 9Xin Jun-Chang, Wang Guo-Ren, Chen Lei et al. Continuously maintaining sliding window Skylines in a sensor network// Proceedings of the DASFAA. Bangkok Thailand, 2007: 509- 521.
  • 10Dittrich J, Quiane-Ruiz J-A, Jindal Aet al. Hadoop+ +: Making a yellow elephant run like a cheetah(without it even noticing). Proceedings of the VLDB Endowment, 2010, 3(1): 518-529.

二级参考文献63

  • 1Chomicki J, Godfrey P, Gryz J, et al. Skyline with pre- sorting[C]//Proceedings of the 19th International Confer- ence on Data Engineering (ICDE), Los Alamitos, CA, USA, 2003. Washington, DC, USA: IEEE Computer Society, 2003: 717-719.
  • 2Tan K L, Eng P K, Ooi B C. Efficient progressive Skyline computation[C]//Proceedings of the 27th International Conference on Very Large Data Bases (VLDB), 2001. San Francisco, CA, USA: Morgan Kaufmann, 2001:301-310.
  • 3Kossmann D, Ramsak F, Rost S. Shooting stars in the sky an online algorithm for Skyline queries[C]//Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), Hong Kong, China, 2002. San Francisco, CA, USA: Morgan Kaufmann, 2002: 275-286.
  • 4Papadias D, Tao Y, Fu G, et al. Progressive Skyline com- putation in database systems[J]. ACM Transactions on Database Systems, 2005, 30(1): 41-82.
  • 5Chan C Y, Jagadish H V, Tan K L, et al. Finding k-dominant Skylines in high dimensional space[C]//Pro- ceedings of the 25th ACM SIGMOD International Con- ference on Management of Data, Chicago, Illinois, USA, 2006. New York, NY, USA: ACM, 2006: 503-514.
  • 6Lin X, Yuan Y, Wang W, et al. Stabbing the sky: efficient Skyline computation over sliding windows[C]//Procee- dings of the 21st International Conference on Data Engi- neering (ICDE), Tokyo, Japan, 2005. Washington, DC, USA: IEEE Computer Society, 2005:502-513.
  • 7Balke W T, Guntzer U, Zheng J X. Efficient distributed skylining for Web information systems[C]//Proceedings of the 9th International Conference on Extending Data- base Technology (EDBT), Heraklion, Crete, Greece, 2004 [S.l.]: Springer, 2004: 256-273.
  • 8Wang S, Beng Chin Ooi, Tung A K H, et al. Efficient Skyline query processing on peer-to-peer networks[C]// Proceedings of the 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, 2007. Washington, DC, USA: IEEE Computer Society, 2007:1126-1135.
  • 9Deng K, Zhou X, Shen H. Multi-source Skyline query processing in road networks[C]//Proceedings of the 23rd International Conference on Data Engineering (ICDE), lstanbul, Turkey, 2007. Washington, DC, USA: IEEE Computer Society, 2007: 796-805.
  • 10Zhu L, Tao Y, Zhou S. Distributed Skyline retrieval with low bandwidth consumption[J]. IEEE Transactions on Data and Knowledge Engineering, 2009, 21 (3): 321-334.

共引文献51

同被引文献399

引证文献44

二级引证文献299

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部