期刊文献+

Research on internet traffic classification techniques using supervised machine learning 被引量:1

Research on internet traffic classification techniques using supervised machine learning
下载PDF
导出
摘要 Interact traffic classification is vital to the areas of network operation and management. Traditional classification methods such as port mapping and payload analysis are becoming increasingly difficult as newly emerged applications (e. g. Peer-to-Peer) using dynamic port numbers, masquerading techniques and encryption to avoid detection. This paper presents a machine learning (ML) based traffic classifica- tion scheme, which offers solutions to a variety of network activities and provides a platform of performance evaluation for the classifiers. The impact of dataset size, feature selection, number of application types and ML algorithm selection on classification performance is analyzed and demonstrated by the following experiments: (1) The genetic algorithm based feature selection can dramatically reduce the cost without diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classification accuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexity and accuracy are both taken into account. (3) Larger dataset and fewer application types would result in better classification accuracy. Finally, early detection with only several initial packets is proposed for real-time network activity and it is proved to be feasible according to the preliminary results. Internet traffic classification is vital to the areas of network operation and management. Traditionalclassification methods such as port mapping and payload analysis are becoming increasingly difficult asnewly emerged applications (e.g. Peer-to-Peer) using dynamic port numbers, masquerading techniquesand encryption to avoid detection. This paper presents a machine learning (ML) based traffic classificationscheme, which offers solutions to a variety of network activities and provides a platform of performanceevaluation for the classifiers. The impact of dataset size, feature selection, number of applicationtypes and ML algorithm selection on classification performance is analyzed and demonstrated by the followingexperiments: (1) The genetic algorithm based feature selection can dramatically reduce the costwithout diminishing classification accuracy. (2) The chosen ML algorithms can achieve high classificationaccuracy. Particularly, REPTree and C4.5 outperform the other ML algorithms when computational complexityand accuracy are both taken into account. (3) Larger dataset and fewer application types wouldresult in better classification accuracy. Finally, early detection with only several initial packets is proposedfor real-time network activity and it is proved to be feasible according to the preliminary results.
出处 《High Technology Letters》 EI CAS 2009年第4期369-377,共9页 高技术通讯(英文版)
基金 Supported by the National High Technology Research and Development Programme of China (No. 2005AA121620, 2006AA01Z232) the Zhejiang Provincial Natural Science Foundation of China (No. Y1080935 ) the Research Innovation Program for Graduate Students in Jiangsu Province (No. CX07B_ 110zF)
关键词 supervised machine learning traffic classification feature selection genetic algorithm (GA) 分类方法 机器学习 学习技术 流量 互联网 监督 网络解决方案 ML算法
  • 相关文献

参考文献12

  • 1陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量:96
  • 2Nir Friedman,Dan Geiger,Moises Goldszmidt.Bayesian Network Classifiers[J].Machine Learning (-).1997(2-3)
  • 3Intemet Assigned Numbers Authority (IANA).Port number assignment. http://www. iana. org/assignments/port-numbers . 2008
  • 4Madhukar A,Williamson C.A longitudinal study of P2P traffic classification[].Proceedings of th Intemational Symposium on Modeling Analysis and Simulation of Computer and Telecommunication Systems ( MASCOTS ).2006
  • 5Haffner P,Sen S,Spatscheck O, et al.ACAS: automated construction of application signatures[].Proceedings of the ACM SIGCOMM workshop on mining network data (SIGCOMM‘ ).2005
  • 6Moore A W,Papagiannaki K.Toward the accurate identification of network applications[].Proceedings of the Passive and Active Measurement Workshop (PAM ).2005
  • 7Nguyen T T T,Armitage G.A Survey of techniques for internet traffic classification using machine learning[].IEEE Communications Surveys & Tutorials.2008
  • 8McGregor A,Hall M,Lorier P, et al.Flow clustering using machine learning techniques[].Proceedings of the th Pas- sive & Active Measurement Workshop (PAM).2004
  • 9Moore A W,Zuev D.Internet traffic classification using Bayesian analysis techniques[].Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems.2005
  • 10Jiang H,Moore A W,Ge Z, et al.Lightweight application classification for network management[].Proceedings of the SIGCOMM Workshop on Intemet Network Management.2007

二级参考文献3

  • 1Wu X,A Heuristic Covering Algorithm for Extension Matrix Approach.Department of Artificial Intelligence,1992年
  • 2洪家荣,Proc Int Computer Science Conference’88, Hong Kong,1988年
  • 3洪家荣,Int Jnal of Computer and Information Science,1985年,14卷,6期,421页

共引文献95

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部