摘要
Classification of network traffic is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identification approaches has been greatly diminished in recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training techniques. Compared to previous approaches, most of which only employed single classifier, multiple classifters and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings: limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is crested and tested and the empirical results prove its feasibility and effectiveness.
Classification of network traffic is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identification approaches has been greatly diminished in recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training techniques. Compared to previous approaches, most of which only employed single classifier, multiple classifters and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings: limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is crested and tested and the empirical results prove its feasibility and effectiveness.
基金
Supported by the National Natural Science Foundation of China (Grant Nos.60525213 and 60776096)
the National Basic Research Program of China (Grant No.2006CB303106)
the National High-Tech Research & Development Program of China (Grant Nos.2007AA01Z236 and 2007AA01Z449)
the Joint Funds of NSFC-Guangdong (Grant No.U0735001)
the National Project of Scientific and Technical Supporting Programs (Grant No.2007BAH13B01)