期刊文献+

基于支持向量机的加密流量识别方法 被引量:16

Identification method of encrypted traffic based on support vector machine
下载PDF
导出
摘要 针对现有的加密流量识别方法难以区分加密流量和非加密压缩文件流量的问题,对互联网中的加密流量、txt流量、doc流量、jpg流量和压缩文件流量进行分析,发现基于信息熵的方法能够有效地将低熵值数据流和高熵值数据流区分开.但该方法不能识别每个字节是随机的而全部流量是伪随机的非加密压缩文件流量,因此采用相对熵特征向量{h_0,h_1,h_2,h_3}区分低熵值数据流和高熵值数据流,采用蒙特卡洛仿真方法估计π值的误差p_(error)来区分局部随机流量和整体随机流量.最终提出基于支持向量机的加密流量和非加密流量的识别方法 SVM-ID,并将特征子空间SVM={h_0,h_1,h_2,h_3,p_(error)}作为SVM-ID方法的输入.将SVM-ID方法和相对熵方法进行对比实验,结果表明,所提方法不仅能够很好地识别加密流量,还能区分加密流量和非加密的压缩文件流量. The existing methods of encrypted traffic classification are difficult to effectively distinguish encrypted traffic and compressed file traffic. Through analyzing the encrypted traffic,txt traffic,doc traffic,jpg traffic,and compressed file traffic,it is found that the methods based on information entropy can effectively separate the low entropy traffic and the high entropy traffic. However,this method cannot distinguish non-encrypted compressed file traffic with byte randomness and full flow pseudo randomness. Therefore,the relative entropy feature vector { h0,h1,h2,h3} is employed to distinguish the low entropy traffic and the high entropy traffic,and the Monte Carlo simulation method is used to estimate the error of π value,p(error),which can be used to distinguish the local random traffic and the whole random traffic. Finally,a support vector machine( SVM)-based identification method( SVM-ID) for encrypted traffic and non encrypted traffic is proposed. And,the SVM-ID method uses the feature space SVM = { h0,h1,h2,h3,p(error)} as the input. The SVM-ID method is compared with the relative entropy method. The experimental results show that the proposed method can not only identify the encrypted traffic well,but also distinguish the encrypted traffic and the non-encrypted compressed file traffic.
作者 程光 陈玉祥
出处 《东南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2017年第4期655-659,共5页 Journal of Southeast University:Natural Science Edition
基金 国家高技术研究发展计划(863计划)资助项目(2015AA015603) 国家自然科学基金资助项目(61602114) 中兴通讯研究基金资助项目 软件新技术与产业化协同创新中心资助项目
关键词 加密流量识别 相对熵 蒙特卡洛仿真 支持向量机 encrypted traffic identification relative entropy Monte Carlo simulation support vector machine
  • 相关文献

参考文献3

二级参考文献120

  • 1刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J].计算机研究与发展,2005,42(3):381-386. 被引量:37
  • 2Langley P. Selection of relevant features in machine learning [C] //Proc of the AAAI Fall Symposium on Relevance. Menlo Park, CA: AAAI, 1994:1-5.
  • 3Dash M, Liu H. Feature selection for classification [J]. International Journal of Intelligent Data Analysis, 1997, 1 (3): 131-156.
  • 4Pudil P, Novovicova J. Novel methods for subset selection with respect to problem knowledge[J]. IEEE Intelligent Systems, 1998, 13(2): 66-74.
  • 5Robnik-Sikonja M, Kononenko I. Theoretical and empirical analysis of ReliefF and RReliefF [J]. Machine Learning, 2003, 53(1): 23-69.
  • 6Hall M. Correlation-based feature selection for discrete and numeric class machine learning [C]//Proc of the 7th Int Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2000:359-366.
  • 7Mitra P, Murthy C A, Pal S K. Unsupervised feature selection using feature similarity [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2002, 24(3) : 301-312.
  • 8Wei H L, Billings S A. Feature subset selection and ranking for data dimensionality reduction [J]. IEEE Trans on Pattern Analysis and Machine Intelligence, 2007, 29(1): 162-166.
  • 9Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy [J]. Journal of Machine Learning Research, 2004, 5(10): 1205-1224.
  • 10Battiti R. Using mutual information for selecting features in supervised neural net learning [J]. IEEE Trans on Neutral Networks, 1994, 5(4): 537-550.

共引文献162

同被引文献72

引证文献16

二级引证文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部