期刊文献+

分布式数据流关系查询技术研究 被引量:21

Relational Query Techniques for Distributed Data Stream:A Survey
下载PDF
导出
摘要 随着在线分析连续数据流的需求日益增多,用于实时处理海量、易变数据的数据流管理系统由此产生.大数据时代下,随着开放式处理平台的发展,为处理大规模且多样化的数据流,出现了若干分布式流处理系统,如S4、Storm、Spark Streaming等.然而,为提升处理系统的易用性和处理能力,需要在其之上构建具有抽象查询语言的关系查询系统,以构筑完整的分布式数据流管理系统.如何设计并实现高效易用的关系查询系统是一个亟待解决的问题.文中首先概述了分布式数据流查询处理的典型应用、数据特征和实现目标.进而,提出了分布式数据流关系查询系统的基础架构,并基于此架构深入分析了用户自定义函数查询、查询优化、驱动方式、编译技术、算子管理、调度管理和并行管理等关键技术.然后,对比分析了SPL、StreamingSQL、Squall和DBToaster这4种具有代表性的查询系统实例.最后,指明了该技术在优化技术、执行策略、实时精准查询和复杂查询分析等方面所面临的挑战和今后的研究工作. The applications that require online processing continuous data stream are increasing. Data stream management systems which are used to deal with massive and variable data in real time have been produced. With the development of open processing platforms in the ear of big data, a number of distributed data stream processing systems have emerged for dealing with large scale and diverse data stream, such as s4, Storm, Spark Streaming, etc. However, we should construct relational query systems which have abstract query language on basis of the processing systems for improving the ease of use and processing capability of them, so as to build complete distributed data stream management systems. How to design and realize the high efficiency and easy-to-use query systems is a great challenge. In this survey, we first provide an overview of typical applications, data characteristics and achieve goals of distributed data stream query processing. Furthermore, we propose the framework of distributed data stream relational query systems. Based on the framework, we analyze the key techniques in several aspects. UDF query, query optimization, query-driven approaches, compiling techniques, operator management, scheduling management and parallel management. Then, there is the comparison of representative query systems including SPL, StreamingSQL, Squall and DBToaster. Finally, some new challenges are put forward, including optimization technique, execution strategy, real-time precise query and complex query analysis.
出处 《计算机学报》 EI CSCD 北大核心 2016年第1期80-96,共17页 Chinese Journal of Computers
基金 国家自然科学基金(61379050 91224008) 国家"八六三"高技术研究发展计划项目基金(2013AA013204) 高等学校博士学科点专项科研基金(20130004130001) 中国人民大学科学研究基金(11XNL010)资助
关键词 大数据 数据流 流处理系统 流查询系统 关系查询技术 big data data stream stream processing system stream query system relationalquery technique
  • 相关文献

参考文献90

  • 1李国杰,程学旗.大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,27(6):647-657. 被引量:1605
  • 2王元卓,靳小龙,程学旗.网络大数据:现状与展望[J].计算机学报,2013,36(6):1125-1138. 被引量:714
  • 3孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169. 被引量:2392
  • 4Big data: Science in the petabyte era. Nature, 2008, 465 (7209) : 1-136.
  • 5Carney D, Cetintemel U, Cherniack M, et al. Monitoring streams A new class of data management applications// Proceedings of the 28th International Conference on Very Large Data Bases (VLDB2002). Hong Kong, China, 2002: 215-226.
  • 6Chandrasekaran S, Cooper O, Deshpande A, et al. TelegraphCQ: Continuous dataflow processing for an uncertain world//Pruceedings of the 1st Biennial Conference on Innovative Data Systems Research (CIDR 2003). Asilomar, USA, 2003:269-280.
  • 7Arasu A, Babcock B, Babu S, et al. STREAM: The stanford stream data manager. IEEE Data Engineering Bulletin, 2003, 26(1): 19-26.
  • 8Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters//Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI 2004). San Francisco, USA, 2004:137-150.
  • 9Li Feng, Ooi B C, Ozsu M T, Wu S. Distributed data management using MapReduce. ACM Computing Surveys, 2014, 46(3): 31:1-31:42.
  • 10Neumeyer L, Robbins B, Nair A, Kesari A. S4: Distributed stream computing platform//Proceedings of the 2010 Industrial Conference on Data Mining Workshops (ICDM2010). Berlin, Germany, 2010:170-177.

二级参考文献330

  • 1Chris Anderson. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired, 2008, 16 (7).
  • 2Albert-L~iszl6 Barab~isi. The network takeover. Nature Physics, 2012,8(1): 14-16.
  • 3Reuven Cohen, Shlomo Havlin. Scale-Free Networks Are U1- trasmall. Physical Review Letters, 2003, 90,(5 ).
  • 4Tony Hey, Stewart Tansley, Kristin Tolle (Editors). The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft, 2009 October 16.
  • 5Big Data. Nature, 2008, 455(7 209): 1-136.
  • 6Dealing with data. Science, 2011,331 ( 6 018 ): 639-806.
  • 7Complexity. Nature Physics, 2012, 8( 1 ).
  • 8Big Data. ERCIM News, 2012, (89).
  • 9David Lazer, Alex Pentland, Lada Adamic et al. Computational Social Science. Science, 2009, 323 ( 5 915 ): 721-723.
  • 10The 2011 Digital Universe Study: Extracting Value from Chaos. International Data Corporation and EMC, June 2011.

共引文献4546

同被引文献193

引证文献21

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部