摘要
长期以来,互联网流量的测量和分析可用于识别网络资源和用户行为,但随着互联网的快速发展和网络的高速访问,网络流量分析愈发困难。大规模的网络流量数据需要具备与之相匹配的存储、计算资源。基于此,提出了基于Hadoop平台的分布式网络流量存储和基于多层的并行计算流特征。通过10个节点,进行针对2TB流跟踪文件的37个网络流侯选特征的计算试验。结果表明,基于Hadoop平台的分布式存储和计算,大大提高了大规模网络流的处理速度,且随着网络流量规模的扩大,网络流量的分析和特征计算时间非常稳定。
For a long time, the measurement and analysis of Internet traffic can be used to identify network resources and user behavior. However, with the rapid development of the Internet and the high-speed access of the network, the analysis of network traffic becomes more and more difficult. Large-scale network traffic data needs matching storage and computing resources. Based on this, distributed network traffic storage based on Hadoop platform and multi-layer parallel computing flow characteristics are proposed. Thirty-seven network flow candidate characteristics for 2 TB stream tracking files are calculated and tested through 10 nodes. The results show that the distributed storage and computing based on Hadoop platform greatly improves the processing speed of large-scale network traffic, and with the expansion of network traffic scale, the analysis and characteristic calculation time of network traffic is very stable.
作者
邓河
贺宗梅
Deng He;He Zongmei(School of Software,Changsha Social Work College,Changsha Hunan 410004,China)
出处
《信息与电脑》
2019年第7期75-76,共2页
Information & Computer
基金
湖南省教育厅科研项目(项目编号:15C0081)