摘要
基于统计特征的机器学习算法已经广泛运用于网络流识别与分类中,但是如何找到高效、简单的统计特征组合一直是本研究领域的重点和难点。针对特征向量维数过高的问题,综合考虑类别之间以及类别内部的方差系数,提出了一种基于方差系数的新型特征选择方法。从数学角度上证明了该方法相较于信息增益和卡方检验具有更低的计算复杂度。实验结果表明,该方法可以比现有方法获得更有效的特征组合以及更高的总体正确率。
With different kinds of services based on the HTTP protocol, such as web browsing, audio, video, and the threats of anonymous traffic on the network being growing, the traffic classification faces huge challenge. Nowadays, a machine learning algorithm based on the statistical characteristics has been widely used in the traffic classification, but the emphasis and the difficulty are to discover the efficient and simple characteristics. This paper presents a novel feature selection method based on coefficient of variance. Compared with existing feature selection methods based on information gain and Chi-square, the method can be realized with lower computational complexity. Six kinds of network multimedia applications:Skype audio, video streaming, network live TV, HTTP download, web browsing (text and images), and web browsing with video are used in the experiment. Experimental results show that the method can achieve higher accuracy of classification than existing methods. Furthermore, the method is tested in the classification experiment of short-time traffic flows for real-time classification.
出处
《南京邮电大学学报(自然科学版)》
北大核心
2016年第6期81-89,共9页
Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金
国家自然科学基金(61271233
60972038)
华为HIRP创新计划(YB2015070064)资助项目
关键词
流分类
网络浏览
视频
QOS
方差系数
特征选择
traffic classification
web browsing
video
QoS
coefficient of variance
feature selection