期刊文献+

NTCI-Flow:一种可扩展的高速网络流量处理框架 被引量:10

NTCI-Flow:A Scalable High-speed Network Traffic Processing Framework
下载PDF
导出
摘要 针对当前基于软/硬件的流导出技术存在的数据失真、不易扩展等问题,本文提出一种准确、通用、易扩展的高速网络流量处理框架NTCI-Flow。首先,基于PF_RING DNA实现了高性能的网络包抓取,采用基于网络包五元组的负载均衡策略对网络包进行分组分发,并利用批处理、无锁队列、多线程等技术将多个网络包封装为单条大消息并行发送,改进与优化网络包转发性能;然后,采用Kafka消息系统作为中间件接收并缓存网络包,从而实现网络包的分布式导入;接着,基于Storm搭建实时流处理平台,开发并部署分布式流重组应用,实现从Kafka中读取网络包,解析并抽取五元组、包大小、时间戳等信息后重组成网络流;最后,增加Hive流数据导入模块,将导出的网络流数据以Parquet格式实时存入HDFS,利用Hive Metastore存储并管理元数据,同时采用基于时间的动态分区机制以减少按时间检索时不必要的磁盘IO。实验结果表明:网络流量采集模块可实现万兆流量的准确采集与转发,即使在万兆流量均为最小包(60字节)的情况下,仍可保证仅有0.03%的丢包率;网络流量导入模块吞吐率与磁盘写入性能相关,在使用7块硬盘缓存数据时吞吐率可达775 MB/s;分布式流重组模块具有良好的通用性及扩展性,通过简单配置即可达到1.26×10~7包/s的吞吐率。目前,NTCI-Flow已用于采集与处理某机构的出口流量,该机构平均流量约3.5 Gbps,峰值带宽为6 Gbps,每秒包数最高可达百万级。在该实际应用中,NTCI-Flow运行情况良好,由其得到的流量数据比Net Stream更准确。 Currently,software-based and hardware-based network flow export technologies are lack of scalability and data accuracy.In order to solve these problems,an accurate,general,scalable and high-speed network traffic processing framework called NTCI-Flow was presented.Firstly,the high-performance network packet capture was realized based on PF_RING DNA and the network packets were grouped and distributed according to the load balancing strategy based on network packet five-tuple.By using batch,lock-free queue and multi-thread technology,multiple network packets were encapsulated into a single large message sent in parallel,which improved and optimized the packet forwarding performance.Secondly,Kafka message system were used as a middleware to receive and cache network packets,in order to achieve the network packet distributed import.Thirdly,based on Storm,the real-time stream processing platform was built and the distributed streaming application was developed and deployed.Network packets were read from Kafka and the five tuples,packet size,timestamp were parsed and extracted.Then the network flow reorganization were completed.Finally,the Hive stream data import module was added and the exported network flows were stored in HDFS in Parquet format.Meanwhile,the Hive Metastore was used to store and manage the metadata.Time-based dynamic partitioning mechanism was adopted to reduce unnecessary disk IO when retrieved by time.The experimental results showed that the network traffic acquisition module could achieve accurate acquisition and forwarding of 10 gigabit traffic,even in the case of the smallest packet(60 byte),it could guarantee only 0.03% packet loss rate;the throughput of the traffic import module was related to the disk writing performance.When using 7 hard disks,the throughput was up to 775 MB/s;the distributed stream reorganization module had good versatility and expansibility,and the ability of processing 1.26×10~7 packets could be achieved by simple configuration.At present,NTCI-Flow is being used to collect and deal with an agency's export traffic.The agency's average traffic is about 3.5 Gbps and the peak bandwidth is 6 Gbps.The maximum number of packets per second can be up to one million.In this practical application,NTCI-Flow is running well,and the traffic data obtained is more accurate than that of Net Stream.
出处 《四川大学学报(工程科学版)》 CSCD 北大核心 2017年第S1期168-174,共7页 Journal of Sichuan University (Engineering Science Edition)
基金 国家自然科学基金资助项目(61272447)
关键词 包抓取 大数据 分布式 STORM 流重组 packet capture big data distributed Storm flow restructuring
  • 相关文献

参考文献1

二级参考文献1

  • 1[美][D.E.科默]DouglasE.Comer,[美][D.L.史蒂文]DavidL.Stevens著,张娟,王海.用TCP/IP进行网际互连[M]电子工业出版社,1998.

共引文献11

同被引文献55

引证文献10

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部