摘要
面向海量数据的流量分类技术日趋重要,已成为网络资源调度、网络信息安全等领域的基础支撑技术。无监督机器学习因其无需手动标记流量数据,具有灵活、通用等特性,已成为网络流量分类研究者广泛使用的核心算法;但目前尚缺乏对相关研究成果全面深入的分析,制约了已有算法应用和进一步研究创新。围绕无监督机器学习在网络流量分类领域的研究进展,重点总结了无监督机器学习算法在网络流量分类中的研究,并从算法分类采用的协议类型、特征参数和结果的有效性进行对比分析;最后针对无监督机器学习算法在流量分类领域的研究方向,在特征提取方法、不平衡数据处理方面给出了新的研究思路。
Massive data-oriented traffic classification technology is increasingly important and has become the basic support technology in the field of network resource scheduling,network information security,etc.Unsupervised machine learning has become a core algorithm widely used by researchers of network traffic classification because it does not require manual identification of traffic data,and is flexible and universal.However,it lacks comprehensive and in-depth analysis of related research results,which has restricted some algorithm applications and further research and innovation.Traffic classification is an important way to realize network management and security,so it plays an important role in network security management and is the current research hotspot in the field of network information security.It focuses on the research of unsupervised machine learning algorithms in network traffic classification,and compares and analyzes the types of protocols,feature parameters and results used in algorithm classification.Finally,the future research direction of unsupervised machine learning algorithms is put forward,such as refined classification of encrypted traffic in the field of traffic classification.This article has important reference value for the exploration of new ideas,new methods and new technologies for network traffic classification.
作者
王方玉
张建辉
卜佑军
陈博
孙嘉
WANG Fangyu;ZHANG Jianhui;BU Youjun;CHEN Bo;SUN Jia(Department of Zhongyuan Network Security Research Institute,Zhengzhou University,Zhengzhou 450002,China;Information Engineering University,Zhengzhou 450001,China)
出处
《信息工程大学学报》
2020年第6期705-710,共6页
Journal of Information Engineering University
基金
国家重点研发计划资助项目(2017YFB0803201)
国家自然科学基金资助项目(61572519)。
关键词
无监督机器学习
流量分类
分层学习
聚类算法
隐变量模型
unsupervised machine learning
traffic classification
hierarchical learning
clustering algorithm
latent variable model