摘要
机器学习在网络流量分类中存在特征选择度量指标单一、类别不平衡和概念漂移等问题,使得模型复杂度提高、泛化能力下降.该文提出基于选择性集成策略的嵌入式特征选择方法,根据选择性集成策略选取部分特征选择器集成,再改进序列前向搜索和封装器组合方法二次搜索最优特征子集.实验结果表明该算法在保证分类效果的同时有效降低了特征子集复杂度,从而达到了分类效果、效率和稳定性的最优平衡.
The problems of feature selection metrics single,class imbalance and concept drift exist in machine learning for network traffic classification,leading the model complexity increased,the generalization ability decreased.Therefore,an embedded feature selection method based on selective ensemble is proposed,according to the selective ensemble strategy to ensemble part of feature selectors,and then through the combination method of improved sequence forward search and wrapper secondary search optimal feature subset.Experimental results show that the proposed algorithm can reduce the complexity of feature subset effectively while ensuring the classification performance,so as to achieve the optimal balance of the classification performance,efficiency and stability.
出处
《计算机学报》
EI
CSCD
北大核心
2014年第10期2128-2138,共11页
Chinese Journal of Computers
基金
江苏省未来网络前瞻性研究项目(BY2013095-5-03)
江苏省科技支撑计划(工业)项目(BE2011173)
江苏省"六大人才高峰"高层次人才项目(2011-DZ024)资助
关键词
选择性集成
特征选择
嵌入式
稳定性
selective ensemble
feature selection
embedded
stability