摘要
针对传统机器学习在新类识别中存在分类精度较低和分类耗时较长的问题,提出了一种基于信息熵的级联式新类识别方法。利用随机森林的投票机制,计算并统计分析各样本的信息熵,作为新类检测的依据,识别已知类和候选新类样本;通过滤除候选新类中的异常流样本,提高分类准确率。实验表明:所提方法在南邮数据集和ISCX数据集的两个实际网络数据集上均能实现约95%的分类准确率,并且单个样本的分类时长仅需0.079 ms;分类精度和时间性能明显优于代表性文献方法。
Aiming at the shortcomings of traditional machine learning in novel class recognition,such as low classification accuracy and long classification time,this paper proposes a cascaded novel class recognition method based on information entropy.This method utilizes the voting mechanism of a Random Forest to calculate and analyze the information entropy of each sample.The entropy is used as a basis for novel class detection to identify known classes and candidate novel class samples.The classification accuracy is improved by filtering out abnormal flow samples in candidate novel classes.Experiments show that the proposed method can achieve a classification accuracy of about 95%on both actual network datasets of NJUPT Dataset(NDset)and ISCX Dataset,and the classification time for a single sample is only 0.079 ms.It is significantly superior to representative literature methods in classification accuracy and time performance.
作者
曾文玺
董育宁
ZENG Wenxi;DONG Yuning(College of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《软件工程》
2023年第11期43-47,共5页
Software Engineering
关键词
网络流分类
新类检测
信息熵
network traffic classification
novel class detection
information entropy