摘要
随着网络数据流量的快速增长,需要高效的流量分类技术来实现网络管理、流量控制和安全检测。传统基于端口和有效负载的流量分类方法准确率低,无监督学习方法往往仅采用单一的聚类算法对数据进行聚类分析,且较少研究对数据本身的处理。为了解决上述问题,提出了先运用GainRatio信息增益率方法对原始数据进行降维处理,再将降维后的数据进行聚类的方法。实验结果表明:提出的方法不仅有效地提高了运行效率,而且随着聚类个数的增加,也明显地提高了高准确率的收敛速度。
With the rapid growth of network data traffic, efficient traffic classification technologies are required to implement network management, flow control and security detection.The traditional port-based and payload-based classification methods have low accuracy, and the unsupervised learning method often adopts only a single clustering algorithm to cluster the data. To solve problems mentioned above, a method of reducing the dimensionality of the original data by using the GainRatio information gain rate method and then clustering the dimensionality-reduced data is proposed. The results show that the proposed method not only effectively improves the operating rate, but also accelerates the convergence rate of high accuracy with the increase of the number of clusters.
作者
高锐
刘北水
李丹
刘杰
尤博
GAO Rui;LIU Beishui;LI Dan;LIU Jie;YOU Bo(CEPREI,Guangzhou 510610,China)
出处
《电子产品可靠性与环境试验》
2020年第S02期51-55,共5页
Electronic Product Reliability and Environmental Testing
基金
2018年工业转型升级资金项目-信息编码核心算法检测评估能力建设
广州市科技计划一般项目(201804010316)
国家重点研发计划项目(2019YFC0118800)
国家重点研发计划项目(2018YFC1201104)资助。
关键词
机器学习
流量聚类
网络安全
维度下降
信息增益
machine learning
traffic clustering
network security
dimensionality reduction
information gain ratio