摘要
针对网络流量特征选择过程中存在的样本标记瓶颈问题,以及现有半监督方法无法选择强相关的特征的不足,提出一种基于类标记扩展的多类半监督特征选择(SFSEL)算法。该算法首先从少量的标记样本出发,通过K-means算法对未标记样本进行类标记扩展;然后结合基于双重正则的支持向量机(MDrSVM)算法实现多类数据的特征选择。与半监督特征选择算法Spectral、PCFRSC和SEFR在Moore数据集进行了对比实验,SFSEL得到的分类准确率和召回率明显都要高于其他算法,而且SFSEL算法选择的特征个数明显少于其他算法。实验结果表明:SFSEL算法能够有效地提高所选特征的相关性,获取更好的网络流量分类性能。
Aiming at the problem of sample labeling in network traffic feature selection, and the deficiency of traditional semi-supervised methods which can not select a strong correlation feature set, a Semi-supervised Feature Selection based on Extension of Label( SFSEL) algorithm was proposed. The model started from a small number of labeled samples, and the labels of unlabeled samples were extended by K-means algorithm, then MDrSVM( Multi-class Doubly regularized Support Vector Machine) algorithm was combined to achieve feature selection of multi-class network data. Comparison experiments with other semi-supervised algorithms including Spectral, PCFRSC and SEFR on Moore network data set were given, where SFSEL got higher classification accuracy and recall with fewer selection features. The experimental results show that the proposed algorithm has a better classification performance with selecting a strong correlation feature set of network traffic.
出处
《计算机应用》
CSCD
北大核心
2014年第11期3206-3209,共4页
journal of Computer Applications
基金
国家安全重大基础研究项目(613148)