期刊文献+

融合随机森林与SHAP的恶意加密流量预测模型 被引量:1

Prediction model for malicious encrypted traffic with random forests and SHAP
下载PDF
导出
摘要 加密流量保护用户隐私信息的同时也会隐藏恶意行为,尽早发现恶意加密流量是抵御不同网络攻击(如分布拒绝式攻击、窃听、注入攻击等)和保护网络免受入侵的关键手段.传统基于端口、深度包检测等恶意流量检测方法难以对抗代码混淆、重新包装等复杂攻击,而基于机器学习的方法也存在误报率高和决策过程难以理解的问题.为此,提出一种恶意加密流量检测高可解释性模型EPMRS,以弥补现有研究在性能与可解释性上存在的局限性.在数据去重,重编码及特征筛选等数据预处理的基础上,基于随机森林构建恶意加密流量检测模型,并与逻辑回归、KNN、LGBM等10种主流机器学习模型进行5折交叉验证的实验对比;基于SHAP框架从整体模型、核心风险特征交互效应及样本决策过程三个不同的层面,全面增强恶意加密流量检测模型的可解释性.EPMRS在MCCCU数据集的实证结果表明,EPMRS对未知加密恶意流量的检测准确率达到99.996%、误识别率为0.0003%,与已有工作相比,性能指标平均提升了0.287175%~7.513175%;同时,通过可解释性分析识别出了session(会话)、flow_duration(流持续时间)、Goodput(有效吞吐量)等为影响恶意加密流量检测的核心风险因素. Encrypted traffic protects the user's private information but also hides malicious behaviors.Early detection of malicious encrypted traffic is a key means to defend against different network attacks(e.g.,distributed denial-of-service attacks,eavesdropping,injection attacks,etc.)and to protect the network from intrusion.Traditional port-based,deep packet inspection and other malicious traffic detection methods are difficult to fight against complex attacks such as code obfuscation,repackaging,etc.,while machine learning-based methods also suffer from high false alarm rates and difficulty to understand the decision-making process.For this reason,this paper proposed a highly interpretable model EPMRS for malicious encrypted traffic detection to make up for the limitations of existing research in terms of performance and interpretability.Based on data preprocessing such as data de-duplication,re-encoding,and feature screening,a maliciously encrypted traffic detection model was constructed based on random forest and compared with 10 mainstream machine learning models such as logistic regression,KNN,LGBM,and so on with 5-fold cross-validation experiments;based on the SHAP framework,from three different levels,namely the overall model,the interaction effect of the core risk features and the decision-making process of the samples.comprehensively enhance the interpretability of maliciously encrypted traffic detection models.The empirical results of EPMRS on the MCCCU dataset showed that the detection accuracy of EPMRS on unknown encrypted malicious traffic reached 99.996%and the misidentification rate was 0.0003%,which improved the performance metrics by an average of 0.287175%~7.513175%compared with the existing work;at the same time.Meanwhile,through interpretable analysis,session,flow_duration,and goodputwere identified as the core risk factors affecting the detection of malicious encrypted traffic.
作者 吴燕 WU Yan(School of Statistics and Data Science,Xinjiang University of Finance and Economics,Urumqi 830012,China)
出处 《哈尔滨商业大学学报(自然科学版)》 CAS 2024年第2期167-178,共12页 Journal of Harbin University of Commerce:Natural Sciences Edition
基金 国家自然科学基金项目(61562078) 新疆天山青年计划项目(2018Q073)。
关键词 恶意加密流量 网络安全 随机森林 SHAP模型 可解释性 malicious encrypted traffic safety of network random forest SHAP model interpretability
  • 相关文献

参考文献4

二级参考文献67

  • 1金婷,王攀,张顺颐,陆青莲,陈东.基于DPI和会话关联技术的QQ语音业务识别模型和算法[J].重庆邮电学院学报(自然科学版),2006,18(6):789-792. 被引量:10
  • 2THOMAS K, ANDRE B, NEVIL B. File-sharing in the Intemet: a Characterization of P2P Traffic in the Backbone[R]. UC, Riverside, 2003.
  • 3SUBHABRATA S, OLIVER S, WANG D M. Accurate, scalable in network identification of P2P traffic using application signatures[A]. International World Wide Web Conference[C]. New York,2004.
  • 4KARAGIANNIS T, PAPAGIANNAKI K, FALOUTSOS M. BLINC: multilevel tratfic classification in the dark[A]. Proc of ACM SIGCOMM[C]. Philadelphia, PA, 2005.
  • 5KARAGIANNIS T, BROIDO A, FALOUTSOS M. Transport layer identification of P2P traffic[A]. Proc of ACM SIGCOMM IMC[C]. Taormina, Sicily, Italy, 2004.
  • 6ZANDER S, NGUYENI T, ARMITAGEI G.Self-learning IP traffic classification based on statistical flow characteristics[A]. Proc of PAM[C]. Boston, MA, 2005.
  • 7ZUEV D, MOORE A W. Traffic classification using a statistical approach[A]. Proc of PAM[C]. Boston, 2005.
  • 8HERN E NOBEL A B, SMITH F D. Statistical clustering of intemet communication patterns[A]. Proceedings of the 35th Symposium on the Interface of Computing Science and Statistics, Computing Science and Statistics[C]. 2003.
  • 9MOORE A W, ZUEV D. Discriminators for Use in Flow-Based Classification[R]. Intel Research, Cambridge, 2005.
  • 10MOORE A W, ZUEV D. Internet tragic classification using bayesian analysis techniques[A]. Proc of ACM SIGMETRICS[C]. Banff, Alberta, Canada. 2005.

共引文献45

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部