基于特征选择与改进的Tri-training的半监督网络流量分类

Semi-Supervised Network Traffic Classification Based on Feature Selection and Improved Tritraining

下载PDF

导出

摘要网络流量分类对网络管理意义重大,目前基于机器学习的流量分类方法存在标注瓶颈、样本不平衡的问题。针对这两个问题,提出一种基于特征选择与改进的Tri-training算法结合的半监督网络流量分类模型。根据最大信息系数、皮尔逊系数选择出与类高度相关但彼此不相关的特征,利用改进的Relief F选择出有利于少数类分类的特征,并将选择出的特征组合成最优特征子集缓解不平衡数据对分类的影响。结合集成思想,优化迭代和加权决策改进传统Tri-training算法,利用改进的Tri-training算法解决标注瓶颈问题。在Moore数据集上进行了实验,实验结果表明提出的方法在利用不平衡的少量有标记的数据下在F-measure上达到了95.26%,与先进的机器学习算法和原始Tri-training方法及其一些改进算法相比具有更好的分类性能。 Network traffic classification is significant for network management,and the current machine learning-based traffic classification methods suffer from labeling bottleneck and sample imbalance.To address these two problems,a semi-supervised network traffic classification model based on the combination of feature selection and improved Tri-training algorithm is proposed.Firstly,features that are highly correlated with classes but not with each other are selected based on the maximum information coefficient and Pearson’s coefficient,features that are beneficial to the classification of a few classes are selected by using the improved Relief F,and the selected features are combined to form an optimal feature subset to alleviate the impact of unbalanced data on classification.Then the traditional Tri-training algorithm is improved by combining the integration idea,optimization iteration and weighted decision making,and the improved Tri-training algorithm is used to solve the annotation bottleneck problem.Finally,experiments are conducted on the Moore dataset.The experimental results show that the proposed method achieves 95.26%on F-measure with the utilization of unbalanced small amounts of labeled data.It has better classification performance compared to advanced machine learning algorithms and the original Tri-training method and some of its improved algorithms.

作者李道全祝圣凯翟豫阳胡一帆 LI Daoquan;ZHU Shengkai;ZHAI Yuyang;HU Yifan(School of Information and Control Engineering,Qingdao University of Technology,Qingdao,Shandong 266520,China)

机构地区青岛理工大学信息与控制工程学院

出处《计算机工程与应用》 CSCD 北大核心 2024年第23期275-285,共11页 Computer Engineering and Applications

基金山东省自然科学基金(ZR2023MF052)。

关键词半监督网络类不平衡网络流量分类特征选择 TRI-TRAINING semi-supervised network class imbalance network traffic classification feature selection Tri-training

分类号 TP393.0 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1柯灵.基于Tri-training算法的半监督软件缺陷预测模型构建研究[J].佳木斯大学学报（自然科学版）,2024,42(9):12-15.
2王泓涵.公立医院财务分析方法以及财务指标体系的构建[J].财经界,2024(8):111-113.
3张帆,刘思远.基于全局自适应跨步全卷积神经网络的网络流量分类方法[J].桂林航天工业学院学报,2024,29(5):747-754.
4刘迪.城市更新的机会成本和边际成本[J].城市规划,2024,48(5):21-29.
5姜宝胜,白玉湖,徐兵祥,马晓强,王苏冉,杜旭林.基于集成学习的致密气藏产能预测新方法[J].中国海上油气,2024,36(5):120-127.
6宋百灵,何彦众,张泽贤,曾诚,俞嘉怡,刘进,胡文华.基于Tri-Training半监督学习的非功能性需求分类方法在工业软件中的应用[J].武汉大学学报（理学版）,2024,70(3):367-375.
7檀家敏.证据导学:小学生批判性思维的新路径——以译林版“Story Time”板块教学为例[J].福建教育学院学报,2024,25(9):89-92.
8白军成,孙秉珍,郭誉齐,陈有为,郭建峰.融合三支聚类与分解集成学习的股票价格预测模型[J].运筹与管理,2024,33(8):213-218.
9王林,刘景亮,王无为.基于空洞卷积融合Transformer的无人机图像小目标检测方法[J].计算机应用,2024,44(11):3595-3602.
10尚秋峰,谷元宇,樊小凯,王健健,姚国珍.基于LCPSO与异构集成学习模型的输电线路覆冰等级预警方法[J].仪器仪表学报,2024,45(9):157-165.

计算机工程与应用

2024年第23期

浏览历史

内容加载中请稍等...

基于特征选择与改进的Tri-training的半监督网络流量分类

相关作者

相关机构

相关主题

浏览历史