摘要
针对网络流量分类中的多类不均衡问题,提出一种基于相对不确定性和对称不确定性的Hybrid型特征选择方法。首先,利用相对不确定性为每个类选择候选特征集;然后,保留每个候选特征集中对称不确定性较高的特征并去除其他特征;最后,利用基于C4.5决策树的wrapper型特征选择方法确定最优特征子集。在真实网络流量数据集上的实验结果表明,与传统方法相比,该方法具有较高的整体准确率、小类召回率和g-mean值,从而可以减轻多类不均衡问题带来的不良影响。
To solve the multi-class imbalance problem in Internet traffic classification, this paper proposed a new hybrid feature selection approach based on relative uncertainty and symmetric uncertainty. Firstly,it used the relative uncertainty value to select candidate feature subset for each class. Then, for each candidate feature subset, it preserved the features with high sym- metric uncertainty value while discarded others. Finally, it selected the optimal feature subset through the wrapper approach based on C4.5 decision tree. The experimental results on real world Internet traffic data sets show that compared with tradition- al feature selection approaches, it leads to higher overall accuracy, recall of minority classes and g-mean value, which can re- duce the adverse effect caused by muhi-class imbalance.
出处
《计算机应用研究》
CSCD
北大核心
2017年第2期568-571,594,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61501289)
国家自然科学青年基金资助项目(61302093)
国家教育部高等学校博士学科点专项基金资助项目(20133108120018)
上海市科委重大项目(14511101505)
中科院先导专项“未来网络系统架构与关键技术研究”子课题资助项目(XDA06010301)
上海市科学技术委员会“扬帆计划”资助项目(14YF1408900)
关键词
网络流量
多类不均衡
特征选择
相对不确定性
对称不确定性
Internet traffic
multi-class imbalance
feature selection
relative uncertainty
symmetric uncertainty