融合随机森林与SHAP的恶意加密流量预测模型被引量：1

Prediction model for malicious encrypted traffic with random forests and SHAP

下载PDF

导出

摘要加密流量保护用户隐私信息的同时也会隐藏恶意行为,尽早发现恶意加密流量是抵御不同网络攻击(如分布拒绝式攻击、窃听、注入攻击等)和保护网络免受入侵的关键手段.传统基于端口、深度包检测等恶意流量检测方法难以对抗代码混淆、重新包装等复杂攻击,而基于机器学习的方法也存在误报率高和决策过程难以理解的问题.为此,提出一种恶意加密流量检测高可解释性模型EPMRS,以弥补现有研究在性能与可解释性上存在的局限性.在数据去重,重编码及特征筛选等数据预处理的基础上,基于随机森林构建恶意加密流量检测模型,并与逻辑回归、KNN、LGBM等10种主流机器学习模型进行5折交叉验证的实验对比;基于SHAP框架从整体模型、核心风险特征交互效应及样本决策过程三个不同的层面,全面增强恶意加密流量检测模型的可解释性.EPMRS在MCCCU数据集的实证结果表明,EPMRS对未知加密恶意流量的检测准确率达到99.996%、误识别率为0.0003%,与已有工作相比,性能指标平均提升了0.287175%~7.513175%;同时,通过可解释性分析识别出了session(会话)、flow_duration(流持续时间)、Goodput(有效吞吐量)等为影响恶意加密流量检测的核心风险因素. Encrypted traffic protects the user's private information but also hides malicious behaviors.Early detection of malicious encrypted traffic is a key means to defend against different network attacks(e.g.,distributed denial-of-service attacks,eavesdropping,injection attacks,etc.)and to protect the network from intrusion.Traditional port-based,deep packet inspection and other malicious traffic detection methods are difficult to fight against complex attacks such as code obfuscation,repackaging,etc.,while machine learning-based methods also suffer from high false alarm rates and difficulty to understand the decision-making process.For this reason,this paper proposed a highly interpretable model EPMRS for malicious encrypted traffic detection to make up for the limitations of existing research in terms of performance and interpretability.Based on data preprocessing such as data de-duplication,re-encoding,and feature screening,a maliciously encrypted traffic detection model was constructed based on random forest and compared with 10 mainstream machine learning models such as logistic regression,KNN,LGBM,and so on with 5-fold cross-validation experiments;based on the SHAP framework,from three different levels,namely the overall model,the interaction effect of the core risk features and the decision-making process of the samples.comprehensively enhance the interpretability of maliciously encrypted traffic detection models.The empirical results of EPMRS on the MCCCU dataset showed that the detection accuracy of EPMRS on unknown encrypted malicious traffic reached 99.996%and the misidentification rate was 0.0003%,which improved the performance metrics by an average of 0.287175%~7.513175%compared with the existing work;at the same time.Meanwhile,through interpretable analysis,session,flow_duration,and goodputwere identified as the core risk factors affecting the detection of malicious encrypted traffic.

作者吴燕 WU Yan(School of Statistics and Data Science,Xinjiang University of Finance and Economics,Urumqi 830012,China)

机构地区新疆财经大学统计与数据科学学院

出处《哈尔滨商业大学学报（自然科学版）》 CAS 2024年第2期167-178,共12页 Journal of Harbin University of Commerce：Natural Sciences Edition

基金国家自然科学基金项目(61562078) 新疆天山青年计划项目(2018Q073)。

关键词恶意加密流量网络安全随机森林 SHAP模型可解释性 malicious encrypted traffic safety of network random forest SHAP model interpretability

分类号 TP393.08 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1孙懿,高见,顾益军.融合一维Inception结构与ViT的恶意加密流量检测[J].计算机工程,2023,49(1):154-162. 被引量：6
2刘兴彬,杨建华,谢高岗,胡玥.基于Apriori算法的流量识别特征自动提取方法[J].通信学报,2008,29(12):51-59. 被引量：39
3HUANG Kun,ZHANG DaFang.An index-split Bloom filter for deep packet inspection[J].Science China(Information Sciences),2011,54(1):23-37. 被引量：2
4Zihan Chen,Guang Cheng,Ziheng Xu,Shuyi Guo,Yuyang Zhou,Yuyu Zhao.Length matters:Scalable fast encrypted internet traffic service classification based on multiple protocol data unit length sequence with composite deep learning[J].Digital Communications and Networks,2022,8(3):289-302. 被引量：2

二级参考文献67

1金婷,王攀,张顺颐,陆青莲,陈东.基于DPI和会话关联技术的QQ语音业务识别模型和算法[J].重庆邮电学院学报（自然科学版）,2006,18(6):789-792. 被引量：10
2THOMAS K, ANDRE B, NEVIL B. File-sharing in the Intemet: a Characterization of P2P Traffic in the Backbone[R]. UC, Riverside, 2003.
3SUBHABRATA S, OLIVER S, WANG D M. Accurate, scalable in network identification of P2P traffic using application signatures[A]. International World Wide Web Conference[C]. New York,2004.
4KARAGIANNIS T, PAPAGIANNAKI K, FALOUTSOS M. BLINC: multilevel tratfic classification in the dark[A]. Proc of ACM SIGCOMM[C]. Philadelphia, PA, 2005.
5KARAGIANNIS T, BROIDO A, FALOUTSOS M. Transport layer identification of P2P traffic[A]. Proc of ACM SIGCOMM IMC[C]. Taormina, Sicily, Italy, 2004.
6ZANDER S, NGUYENI T, ARMITAGEI G.Self-learning IP traffic classification based on statistical flow characteristics[A]. Proc of PAM[C]. Boston, MA, 2005.
7ZUEV D, MOORE A W. Traffic classification using a statistical approach[A]. Proc of PAM[C]. Boston, 2005.
8HERN E NOBEL A B, SMITH F D. Statistical clustering of intemet communication patterns[A]. Proceedings of the 35th Symposium on the Interface of Computing Science and Statistics, Computing Science and Statistics[C]. 2003.
9MOORE A W, ZUEV D. Discriminators for Use in Flow-Based Classification[R]. Intel Research, Cambridge, 2005.
10MOORE A W, ZUEV D. Internet tragic classification using bayesian analysis techniques[A]. Proc of ACM SIGMETRICS[C]. Banff, Alberta, Canada. 2005.

共引文献45

1董永苹,余翔湛,吴刚.基于决策树的P2P节点识别技术研究[J].通信学报,2013,34(S2):40-46.
2王变琴,余顺争.基于会话的应用特征自适应提取[J].计算机科学,2011,38(2):82-85.
3王变琴,余顺争.识别应用流量的一种新方法[J].小型微型计算机系统,2011,32(5):875-880. 被引量：1
4鲁刚,张宏莉,叶麟.P2P流量识别[J].软件学报,2011,22(6):1281-1298. 被引量：48
5吴昊,程光.HTTP网络应用特征串的自动提取[J].广西大学学报（自然科学版）,2011,36(A01):61-64. 被引量：5
6张玉冲,蔡皖东,丁要军.一种适用于应用层协议的特征提取算法[J].计算机工程,2012,38(4):266-268. 被引量：2
7邓伟锋,程绍银,蒋凡,吕秀全.应用层负载特征定义及自动提取方法[J].通信技术,2012,45(7):20-23. 被引量：2
8路林,罗军勇,刘琰,李明涛.协议签名特征自动发现方法[J].信息工程大学学报,2012,13(5):610-614. 被引量：4
9董仕,王岗.基于UDP流量的P2P流媒体流量识别算法研究[J].通信学报,2012,33(12):25-34. 被引量：8
10黎敏,余顺争.抗噪的未知应用层协议报文格式最佳分段方法[J].软件学报,2013,24(3):604-617. 被引量：16

同被引文献12

1崔鹏.加强自然灾害风险研究,服务丝路安全绿色发展[J].科技导报,2020,38(16):1-1. 被引量：3
2彭建兵,王启耀,庄建琦,冷艳秋,范仲杰,王少凯.黄土高原滑坡灾害形成动力学机制[J].地质力学学报,2020,26(5):714-730. 被引量：52
3邹强,郭晓军,罗渝,姜元俊,崔鹏,苏立君,欧国强,潘华利,刘维明.中巴经济走廊滑坡泥石流灾害格局与风险应对[J].中国科学院院刊,2021,36(2):160-169. 被引量：5
4方然可,刘艳辉,黄志全.基于机器学习的区域滑坡危险性评价方法综述[J].中国地质灾害与防治学报,2021,32(4):1-8. 被引量：22
5Xinzhi Zhou,Haijia Wen,Yalan Zhang,Jiahui Xu,Wengang Zhang.Landslide susceptibility mapping using hybrid random forest with GeoDetector and RFE for factor optimization[J].Geoscience Frontiers,2021,12(5):355-373. 被引量：17
6朱颖彦,潘军宇,李朝月,杨志全,廖丽萍,MUHAMMAD Waseem.中巴喀喇昆仑公路冰川泥石流[J].山地学报,2022,40(1):71-83. 被引量：4
7祁生文,李永超,宋帅华,兰恒星,马凤山,李志清,陈晓清,崔振东,张路青,刘春玲,陈卫忠,邹宇,唐凤娇,鲁晓,郭松峰.青藏高原工程地质稳定性分区及工程扰动灾害分布浅析[J].工程地质学报,2022,30(3):599-608. 被引量：21
8殷跃平.地质灾害风险调查评价方法与应用实践[J].中国地质灾害与防治学报,2022,33(4). 被引量：22
9王立朝.重庆市巫溪县“6·23”滑坡灾害应急处置与调查评估工作思考[J].中国减灾,2023(7):28-31. 被引量：1
10肖诗荣,魏瑞琦,李莹,杨璇喆.意大利瓦依昂滑坡研究综述[J].人民长江,2023,54(4):130-140. 被引量：7

引证文献1

1戴勇,孟庆凯,陈世泷,李威,杨立强.基于可解释神经网络的中巴公路沿线区域工程扰动滑坡危险性评价[J].工程地质学报,2024,32(3):935-946.

1邓梦华,张天舒,陈军飞.基于XGBoost-SHAP模型的太湖流域居民生态补偿支付意愿影响因素研究[J].水利经济,2024,42(2):44-50.
2楼立君,李人龙,李雁,李静,康晓宇,王向平,潘阳林.吲哚美辛通过恢复自噬流减轻雨蛙素诱导的小鼠急性胰腺炎[J].空军军医大学学报,2024,45(3):263-269.
3农药运输要点[J].湖南农业,2024(3):16-16.
4无.私自分装农药按照未取得农药生产许可证生产农药论处[J].农药市场信息,2024(3):17-17.

哈尔滨商业大学学报（自然科学版）

2024年第2期

浏览历史

内容加载中请稍等...

融合随机森林与SHAP的恶意加密流量预测模型被引量：1

参考文献4

二级参考文献67

共引文献45

同被引文献12

引证文献1

相关作者

相关机构

相关主题

浏览历史

融合随机森林与SHAP的恶意加密流量预测模型 被引量：1

参考文献4

二级参考文献67

共引文献45

同被引文献12

引证文献1

相关作者

相关机构

相关主题

浏览历史

融合随机森林与SHAP的恶意加密流量预测模型被引量：1