摘要
针对网络中存在的对等网络(P2P)流量泛滥导致的流量失衡问题,提出将非平衡数据分类思想应用于流量识别过程。通过引入合成少数类过采样技术(SMOTE)算法并进行改进,提出了均值SMOTE(M-SMOTE)算法,实现对流量数据的平衡化处理。在此基础上分别采用3种机器学习分类器:随机森林(RF)、支持向量机(SVM)、反向传播神经网络(BPNN)对处理后各类流量进行识别。理论分析与仿真结果表明,在不影响P2P流量识别准确率的前提下,与非平衡状态相比,引入SMOTE算法将非P2P流量的识别准确率平均提高了16.5个百分点,将网络流量的整体识别率提高了9.5个百分点;与SMOTE算法相比,M-SMOTE算法将非P2P流量的识别准确率与网络流量的整体识别率分别进一步提高了3.2个百分点和2.6个百分点。实验结果表明,非平衡数据分类思想可有效解决P2P流量过多导致的非P2P流量识别率低的问题,同时所提M-SMOTE算法具有更高的识别准确度。
To solve the problem existing in traffic classification that Peer-to-Peer (P2P) traffic is much more than that of non-P2P, a new traffic classification method for imbalanced network data was presented. By introducing and improving Synthetic Minority Over-sampling Technique (SMOTE) algorithm, a Mean SMOTE (M-SMOTE) algorithm was proposed to realize the balance of traffic data. On the basis of this, throe kinds of machine learning classifiers: Random Forest (RF), Support Vector Machine (SVM), Back Propagation Neural Network (BPNN) were used to identify the various types of traffic. The theoretical analysis and simulation results show that, compared with the imbalanced state, the SMOTE algorithm improves the recognition accuracy of non-P2P traffic by 16.5 percentage points and raises the overall recognition rate of network traffic by 9.5 percentage points. Compared with SMOTE algorithm, the M-SMOTE algorithm further improves the recognition rate of non-P2P traffic and the overall recognition rate of network traffic by 3.2 percentage points and 2. 6 percentage points respectively. The experimental results show that the way of imbalancod data classification can effectively solve the problem of low P2P traffic recognition rate caused by excessive P2P traffic, and the M-SMOTE algorithm has higher recognition accuracy rate than SMOTE.
出处
《计算机应用》
CSCD
北大核心
2018年第1期20-25,共6页
journal of Computer Applications
基金
国家科技重大专项(2016ZX01012101)
国家自然科学基金面上项目(61572520)
国家自然科学基金创新群体项目(61521003).
关键词
非平衡数据
P2P流量
流量识别
机器学习
合成少数类过采样技术算法
imbalanced data
Peer-to-Peer (P2P) traffic
traffic classification
machine learning
Synthetic MinorityOver sampling Technique (SMOTE) algorithm