期刊文献+

非平衡网络流量识别方法 被引量:8

New traffic classification method for imbalanced network data
下载PDF
导出
摘要 针对网络中存在的对等网络(P2P)流量泛滥导致的流量失衡问题,提出将非平衡数据分类思想应用于流量识别过程。通过引入合成少数类过采样技术(SMOTE)算法并进行改进,提出了均值SMOTE(M-SMOTE)算法,实现对流量数据的平衡化处理。在此基础上分别采用3种机器学习分类器:随机森林(RF)、支持向量机(SVM)、反向传播神经网络(BPNN)对处理后各类流量进行识别。理论分析与仿真结果表明,在不影响P2P流量识别准确率的前提下,与非平衡状态相比,引入SMOTE算法将非P2P流量的识别准确率平均提高了16.5个百分点,将网络流量的整体识别率提高了9.5个百分点;与SMOTE算法相比,M-SMOTE算法将非P2P流量的识别准确率与网络流量的整体识别率分别进一步提高了3.2个百分点和2.6个百分点。实验结果表明,非平衡数据分类思想可有效解决P2P流量过多导致的非P2P流量识别率低的问题,同时所提M-SMOTE算法具有更高的识别准确度。 To solve the problem existing in traffic classification that Peer-to-Peer (P2P) traffic is much more than that of non-P2P, a new traffic classification method for imbalanced network data was presented. By introducing and improving Synthetic Minority Over-sampling Technique (SMOTE) algorithm, a Mean SMOTE (M-SMOTE) algorithm was proposed to realize the balance of traffic data. On the basis of this, throe kinds of machine learning classifiers: Random Forest (RF), Support Vector Machine (SVM), Back Propagation Neural Network (BPNN) were used to identify the various types of traffic. The theoretical analysis and simulation results show that, compared with the imbalanced state, the SMOTE algorithm improves the recognition accuracy of non-P2P traffic by 16.5 percentage points and raises the overall recognition rate of network traffic by 9.5 percentage points. Compared with SMOTE algorithm, the M-SMOTE algorithm further improves the recognition rate of non-P2P traffic and the overall recognition rate of network traffic by 3.2 percentage points and 2. 6 percentage points respectively. The experimental results show that the way of imbalancod data classification can effectively solve the problem of low P2P traffic recognition rate caused by excessive P2P traffic, and the M-SMOTE algorithm has higher recognition accuracy rate than SMOTE.
出处 《计算机应用》 CSCD 北大核心 2018年第1期20-25,共6页 journal of Computer Applications
基金 国家科技重大专项(2016ZX01012101) 国家自然科学基金面上项目(61572520) 国家自然科学基金创新群体项目(61521003).
关键词 非平衡数据 P2P流量 流量识别 机器学习 合成少数类过采样技术算法 imbalanced data Peer-to-Peer (P2P) traffic traffic classification machine learning Synthetic MinorityOver sampling Technique (SMOTE) algorithm
  • 相关文献

参考文献4

二级参考文献54

  • 1卿斯汉,蒋建春,马恒太,文伟平,刘雪飞.入侵检测技术研究综述[J].通信学报,2004,25(7):19-29. 被引量:232
  • 2陶雪娇,胡晓峰,刘洋.大数据研究综述[J].系统仿真学报,2013,25(S1):142-146. 被引量:340
  • 3韩京宇,徐立臻,董逸生.一种大数据量的相似记录检测方法[J].计算机研究与发展,2005,42(12):2206-2212. 被引量:32
  • 4丁珂.中国互联网骨干网市场问题分析与政策建议[J].广东通信技术,2007,27(6):10-14. 被引量:4
  • 5Slyck News.CacheLogic study-P2P is changing[EB/OL].http://www.slyck.com/story914_CacheLogic_Study_P2P_is_Changing,2005-09-16.
  • 6Ipoque.Ipoque Company Internet study 2008/2009[EB/OL].http://www.ipoque.com/sites/default/files/mediafiles/documents/internet-study-2008-2009.pdf,2009-04-29.
  • 7韦乐平.电信业和电信技术发展的趋势和挑战[EB/OL].http://wenku.baidu.com/view/be139cc78bd63186bcebbcdc.html,2010-10-15.
  • 8Wierzbicki A,Leibowitz N,Ripeanu M,et al.Cache replacement policies revisited:the case of P2P traffic.Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid[C].Chicago,Illinois,USA:IEEE Press,2004.182-189.
  • 9Gummadi K,Dunn R,et al.Measurement,modeling,and analysis of a Peer-to-Peer file-sharing workload.Proceedings of the 19th ACM Symposium on Operating Systems Principles[C].New York,USA:ACM Press,2003.314-329.
  • 10Hefeeda M,Saleh O.Traffic modeling and proportional partial caching for Peer-to-Peer systems[J].IEEE Transactions on Networking,2008,16(6):1447-1460.

共引文献109

同被引文献72

引证文献8

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部