摘要
僵尸网络是大数据时代下最严重的网络安全问题之一,僵尸网络感染未受保护的机器,跟踪与命令控制服务器的通信,发送和接收恶意命令.攻击者利用僵尸网络发起DDoS、钓鱼、数据窃取、垃圾邮件等危险攻击.针对上述问题,采用集成学习方法研究在CTU-13数据集中的僵尸网络流量场景下恶意流量的识别问题.在会话级别对数据集进行预处理,采用局部离群因子算法(LOF)筛选离群数据,选择并构建特征;在流量检测阶段,采用XGBoost算法训练得到基于集成学习的流量分类器,并与三种当前主流的传统机器学习算法K-最近邻算法(KNN)、随机森林(Random forest)和支持向量机(SVM)进行比较.实验表明,XGBoost算法的分类准确率最高,达到99.89%.最后,使用SHAP对关键特征在分类任务中的贡献度进行可视化展示.
Botnet is one of the most serious network security problems in the era of big data.Botnet infects unprotected machines,tracks communication with command control servers,and sends and receives malicious commands.Attackers use botnet to launch dangerous attacks such as DDoS,phishing,data theft,and spam.Aiming at the above problems,Ensemble learning method is adopted to study the identification of malicious traffic in the botnet traffic scenario in the CTU-13 dataset.Firstly,the dataset is preprocessed at the session level,using the local outlier factor(LOF)algorithm to filter out outlier data,select and construct features.Then,in the traffic detection phase,XGBoost algorithm is used to train a traffic classifier based on ensemble learning,and compared with three current mainstream traditional machine learning algorithms such as K-nearest neighbor algorithm(KNN),random forest and support vector machine(SVM).The experiment shows that the XGBoost algorithm has the highest classification accuracy,reaching 99.89%.Finally,SHAP is used to visualize the contribution of key features in the classification task.
作者
王海宽
WANG Hai-kuan(Department of Information Engineering,Jingcheng Institute of Technology,Jincheng 048026,China)
出处
《西安文理学院学报(自然科学版)》
2023年第4期27-34,共8页
Journal of Xi’an University(Natural Science Edition)
基金
晋城职业技术学院2022年校级课题(LX2216):“基于人工智能的网络流量分析技术的研究”
山西省教育科学“十四五”规划2022年度课题(GH-221026):“云计算下网络安全技术实现的路径分析”。
关键词
网络流量分类
僵尸网络
机器学习
数据集
算法
network traffic classification
botnet
machine learning
data set
algorithm