摘要
随着网络技术的发展和广泛应用,加密流量已成为保护用户隐私的关键技术。但同时,恶意软件和攻击者也利用加密流量来隐藏其行为,规避传统的网络入侵检测系统。现有的恶意加密流量检测方法存在一些问题,如基于统计特征的方法需要依赖专家经验进行特征提取,且不同协议的特征无法通用;基于原始输入的深度学习方法存在信息不完整和字段填充等数据问题,对加密流量交互行为的语义表征不足。为解决上述问题,提出了一种名为会话统计编码器模型(Conversation Statistic Encoder Model,CSEM)的方法。与传统的将字节流输入深度神经网络的模式不同,该方法借鉴了transformer-encoder模型,引入了一种新的流量包特征解析方式。所提方法能够针对每个流量包构建出固定长度的向量表示,并且无需进行零填充,同时避免了特征提取过程对具体加密协议的依赖,构建了一个混合深度神经网络,为恶意加密流量检测提供了一种新的思路。在DataCon和自建数据集上对所提模型进行了验证,其在DataCon公开数据集上的召回率达到了0.9911,精确率达到了0.9407,F1值达到了0.9652(相比随机森林模型F1值提升了9%),几项指标均达到了目前的最佳水平。
With the development and widespread application of network technology,encrypted traffic has become a key technology for protecting user privacy.However,malware and attackers also use encrypted traffic to hide their behaviors and evade traditional network intrusion detection systems.Existing malicious encrypted traffic detection methods have some pro-blems.Statistics-based methods rely on expert experience for feature extraction,and features of different protocols cannot be generalized.Deep learning methods based on raw inputs have incomplete information and field padding data issues,leading to insufficient semantic representation of encrypted traffic interactions.To solve the above problems,this paper proposes a method called“conversation statistic encoder model(CSEM)”.The method draws on the transformer encoder model and introduces a new traffic packet feature parsing method,and it is different from the traditional mode of inputting byte streams into deep neural networks.The proposed method can construct fixed-length vector representations for each traffic packet without padding zeros,while avoiding dependence on specific encrypted protocols in the feature extraction process.A hybrid deep neural network is constructed to provide a new idea for malicious encrypted traffic detection.The proposed method is verified on the DataCon dataset and self-built dataset,and the experimental results on Datacon dataset show a recall of 0.9911,precision of 0.9407,and F1 score of 0.9652,reaching the current best level,and the F1 score is 9%higher than that of the random forest model.
作者
巩思越
刘辉
王宝会
GONG Siyue;LIU Hui;WANG Baohui(College of Software,Beihang University,Beijing 100000,China)
出处
《计算机科学》
CSCD
北大核心
2024年第11期340-346,共7页
Computer Science
关键词
会话
加密流量检测
编码器
Conversation
Encrypted traffic detection
Encoder