摘要
大数据时代催生了互联网流量的指数级增长,为了有效地管控网络资源,提高网络安全性,需要对网络流量进行快速、准确的分类,这就对流量分类技术的实时性提出了更高的要求。目前,国内外的网络流量分类研究大多是在单机环境下进行的,计算资源有限,难以应对高速网络中的(准)实时流量分类任务。本文在充分借鉴已有研究成果的基础上,吸收当前最新的思想和技术,基于Spark平台,有机结合其流处理框架Spark Streaming与机器学习算法库MLlib,提出一种大规模网络流量准实时分类方法。实验结果表明,该方法在保证高分类准确率的同时,也具有很好的实时分类能力,可以满足实际网络中流量分类任务的实时性需求。
In big data era, the internet traffic presents an exponential growth. In order to effectively control network resources and improve network security, internet traffic should be classified quickly and accurately, which leads to a higher requirement for real time performance of the traffic classification technology. At present, the classification of network traffic were carried out in the stand-alone environment for most of researches, so the computing resources were too limited to respond to real-time or quasi-realtime classification of internet traffic in the high-speed network. In this paper, with reference to the existing research results and the latest theories and technologies, based on the Spark platform, combining the flow processing framework Spark Streaming with machine learning algorithm library MLlib, a quasi-realtime classification method of large scale network traffic was proposed. The experimental result showed that the proposed method guarantees high classification accuracy, and it has a good capacity of real-time classification, which meets the real-time requirements of the traffic classification in real network.
出处
《科研信息化技术与应用》
2016年第2期25-34,共10页
E-science Technology & Application
关键词
SPARK
流量分类
大规模
准实时
机器学习
Spark
traffic classification
large-scale
quasi-realtime
machine learning