摘要
针对现有的加密流量识别方法难以区分加密流量和非加密压缩文件流量的问题,对互联网中的加密流量、txt流量、doc流量、jpg流量和压缩文件流量进行分析,发现基于信息熵的方法能够有效地将低熵值数据流和高熵值数据流区分开.但该方法不能识别每个字节是随机的而全部流量是伪随机的非加密压缩文件流量,因此采用相对熵特征向量{h_0,h_1,h_2,h_3}区分低熵值数据流和高熵值数据流,采用蒙特卡洛仿真方法估计π值的误差p_(error)来区分局部随机流量和整体随机流量.最终提出基于支持向量机的加密流量和非加密流量的识别方法 SVM-ID,并将特征子空间SVM={h_0,h_1,h_2,h_3,p_(error)}作为SVM-ID方法的输入.将SVM-ID方法和相对熵方法进行对比实验,结果表明,所提方法不仅能够很好地识别加密流量,还能区分加密流量和非加密的压缩文件流量.
The existing methods of encrypted traffic classification are difficult to effectively distinguish encrypted traffic and compressed file traffic. Through analyzing the encrypted traffic,txt traffic,doc traffic,jpg traffic,and compressed file traffic,it is found that the methods based on information entropy can effectively separate the low entropy traffic and the high entropy traffic. However,this method cannot distinguish non-encrypted compressed file traffic with byte randomness and full flow pseudo randomness. Therefore,the relative entropy feature vector { h0,h1,h2,h3} is employed to distinguish the low entropy traffic and the high entropy traffic,and the Monte Carlo simulation method is used to estimate the error of π value,p(error),which can be used to distinguish the local random traffic and the whole random traffic. Finally,a support vector machine( SVM)-based identification method( SVM-ID) for encrypted traffic and non encrypted traffic is proposed. And,the SVM-ID method uses the feature space SVM = { h0,h1,h2,h3,p(error)} as the input. The SVM-ID method is compared with the relative entropy method. The experimental results show that the proposed method can not only identify the encrypted traffic well,but also distinguish the encrypted traffic and the non-encrypted compressed file traffic.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2017年第4期655-659,共5页
Journal of Southeast University:Natural Science Edition
基金
国家高技术研究发展计划(863计划)资助项目(2015AA015603)
国家自然科学基金资助项目(61602114)
中兴通讯研究基金资助项目
软件新技术与产业化协同创新中心资助项目
关键词
加密流量识别
相对熵
蒙特卡洛仿真
支持向量机
encrypted traffic identification
relative entropy
Monte Carlo simulation
support vector machine