期刊文献+

面向大规模网络流量数据的实时汇聚查询关键技术研究 被引量:17

Research on Key Technologies of Real-time Data Collection and Retrieval for Very Large Scale Network Flow
下载PDF
导出
摘要 随着互联网发展,用户面临网络流量数据规模大、处理时效要求高的挑战,需解决数据采集、实时处理、存储组织和查询检索中的关键问题,为此,本文提出一种分布式的数据汇聚查询平台,通过半同步半异步模式的分级架构,支持采集超大规模流量数据;利用多分区队列的消息缓存、并行分布式流处理和基于属性划分的数据加载等手段优化组合,实现高效的实时处理;采用基于抽象数据访问驱动的虚分区式数据存储来对异构数据统一管理,具备良好扩展性;通过异步构建的分级索引架构,实现对数据报文的快速检索,最终为用户提供低延迟、高吞吐、快查询的一体化系统.实验证明平台有良好性能和可扩展性,主要环节有数倍以上不同程度的性能提升,并已应用于实际系统. With the continuous development and explosive grow th of the Internet,users are facing the challenges of massive network flowand strict requirements of real-time processing.Hence,key problems in data collection,real-time processing,storage organization and query retrieval in massive network flowis required to be addressed to solve the aforementioned challenges.This paper proposes a distributed real-time data aggregation query platform.It collects large scale network flow through a hierarchical structure of semi-synchronous and semi-asynchronous mode.It realizes efficient real-time processing by optimized message caching for multi-partition queues,parallel distributed stream processing and data loading based on attribute partition.The scalability of the proposed platform is established by using virtual partition data storage base on abstract data access driver.It also achieves rapid retrieval of massive data through asynchronous construction of hierarchical index,and ultimately provides users an integrated system with low latency,high throughput and fast query.Experiments show that the platform has convincing performance and scalability,and the performance has been improved significantly.The proposed platform has been applied in several practical systems.
作者 郭庆 朱一凡 谢莹莹 张榆 陈小兵 GUO Qing;ZHU Yi-fan;XIE Ying-ying;ZHANG Yu;CHEN Xiao-bing(School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China;Bigdata Department,Daw ning Information Industry Co.,Ltd.,Beijing 100193,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2020年第6期1314-1320,共7页 Journal of Chinese Computer Systems
基金 国家重点研发计划项目(2016YFC0802602)资助.
关键词 网络流量数据 大规模数据采集 实时处理 抽象数据访问驱动 分级索引 network flow large scale data collect real-time process abstract data access driver hierarchical index
  • 相关文献

参考文献11

二级参考文献313

  • 1杨松岸,杨华,杨宇航.用于TCP/IP减荷的智能网卡的设计与实现[J].计算机工程,2004,30(14):178-180. 被引量:5
  • 2王佰玲,方滨兴,云晓春.零拷贝报文捕获平台的研究与实现[J].计算机学报,2005,28(1):46-52. 被引量:67
  • 3Mehta M,DeWitt D.Data placement in shared-nothing parallel database systems[J].The VLDB Journal,1997,6(1):53-72.
  • 4DeWitt D,Gray J.Parallel database systems:The future of high performance database systems[J].Communications of ACM,1992,35(6):85-98.
  • 5Bitton D,Dewitt D J.Duplication record elimination in large data files[J].ACM Trans on Database Systems,1983,8(2):255-265.
  • 6Wang Xiaoyu,Cherniack Mitch.Avoid sorting and grouping in processing queries[C]//Proc of the 29th Int Conf on VLDB.San Francisco:Morgan Kaufmann,2003:826-837.
  • 7Claussen J,Kemper A,Kossmann D,et al.Exploiting early sorting and early partitioning for decision support query processing[J].The VLDB Journal,2000,9(3):190-213.
  • 8Graefe G,Cole R L.Fast algorithms for universal quantification in large databases[J].ACM Trans on Database Systems,1995,20(2):187-236.
  • 9Kitsuregawa M,Ogawa Y.Bucket spreading parallel hash:A new,robust,parallel hash join method for data skew in the super database computer(SDC)[C]//Proc of the 16th Int Conf on VLDB.San Francisco:Morgan Kaufmann,1990:210-221.
  • 10Ung Kyu Park,Hwang Kyu Choi,Tag Gon Kim.Uniform partitioning of relations using histogram equalization framework:An efficient parallel hash-based join[J].Information Processing Letters,1995,55(5):283-289.

共引文献4475

同被引文献159

引证文献17

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部