期刊文献+

分类不平衡协议流的机器学习算法评估与比较 被引量:26

Machine Learning Algorithms for Classifying the Imbalanced Protocol Flows:Evaluation and Comparison
下载PDF
导出
摘要 网络协议流不平衡环境下,流样本分布的变化对基于机器学习的流量分类器准确性及稳定性有较大的影响.选择合适的机器学习算法以适应网络协议流不平衡环境下的在线流量分类,显得格外重要.为此,首先通过单因子实验设计,验证了C4.5决策树、贝叶斯核估计(NBK)和支持向量机(SVM)这3种分类算法统计TCP连接开始的前4个数据包足以分类流量.接着,比较了上述3种分类算法的性能,发现C4.5决策树的测试时间最短,SVM分类算法最稳定.然后,将Bagging算法应用到流量分类中.实验结果表明,Bagging分类算法的稳定性与SVM相似,且测试时间与建模时间接近于C4.5决策树,因此更适于在线分类流量. In the case of the imbalanced protocol flows, the changes of flow distribution have a huge impact on the accuracy and stability of traffic classifiers that use machine learning algorithms. It is very important to select a suitable machine learning algorithm to classify the imbalanced protocol flows on line. By means of single-factor experiment design, this paper verifies that it is possible for C4.5 decision tree, Naive Bayes with kernel density estimation (NBK) and support vector machine (SVM) to classify traffic with the first four packets of the TCP connection. After comparing the performances of the three classifiers abovementioned, the study finds that the testing time of C4.5 decision tree is the shortest and SVM is the most stable. Finally, Bagging algorithm is applied to classify traffic. The experimental results show that, the stability of Bagging is similar to SVM and the testing time and modeling time of Bagging is close to C4.5 decision tree. Therefore, Bagging classifier is the most suitable to classify traffic on line.
作者 张宏莉 鲁刚
出处 《软件学报》 EI CSCD 北大核心 2012年第6期1500-1516,共17页 Journal of Software
基金 国家自然科学基金(60903166) 国家重点基础研究发展计划(973)(2007CB311101 2011CB302605) 国家高技术研究发展计划(863)(2010AA012504 2011AA010705)
关键词 不平衡 特征选择 流量分类 集成学习 单因子实验 imbalance feature selection traffic classification ensemble learning single-factor experiment
  • 相关文献

参考文献2

二级参考文献29

  • 1Madhukar A, Williamson C. A longitudinal study of P2P traffic classification [C]//Proc of the 14th IEEE Int Syrup on Modeling, Analysis, and Simulation. Washington, DC IEEE Computer Society, 2006:179-188
  • 2Moore A W, Papagiannaki K. Toward the accurate identification of network applications [G]//Dovrolis C. LNCS 3431: Proc of the PAM 2005. Heidelberg: Springer, 2005:41-54
  • 3Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: Multilevel traffic classification in the dark [C]//Proc of ACM SIGCOMM. New York: ACM, 2005.. 229-240
  • 4Roughan M, Sen S, Spatscheck O, et al. Class of service mapping for QoS: A statistical signature-hased approach to IP traffic classification [C]//Proc of ACM SIGCOMM Internet Measurement Conf 2004. New York: ACM, 2004: 135-148
  • 5Zuev D. Moore A W. Traffic classification using a statistical approach [G]//Dovrolis C. LNCS 3431: Proc of the PAM. Heidelberg, Germany: Springer, 2005:321-324
  • 6Moore A W, Zuev D. Internet traffic classification using Bayesian analysis techniques [C] //Proc of the 2005 ACM SIGMETRICS Int Conf on Measurement and Modeling of Computer Systems. New York: ACM, 2005: 50-60
  • 7Tan P N, Steinbach M, Kumar V. Introduction to Data Mining [M]. Boston: Addison Wesley, 2006
  • 8Moore A W, Zuev D, Crogan M. Discriminators for use in flow-based classification, RR-05-13 [R]. London: Queen Mary University of London, 2005
  • 9Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques [M]. 2nd ed. Amsterdam: Elsevier Inc. , 2005
  • 10Chang C C, Lin C J. LIBSVM: A library for support vector machines[EB/OL]. 2001 [2007-08-06]. http://www.csie. ntu. edu. tw/-ejlin/libsvm

共引文献211

同被引文献232

引证文献26

二级引证文献112

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部