摘要
封装式特征选择算法可以准确地选择出有价值的特征,但是其评价过程伴随着极大的时间复杂度。为此,该文针对封装式特征选择算法中时间复杂度最高的交叉验证评价环节,提出了可以替代交叉验证的特征集直接评价方法——LW测量。进一步,将该方法与封装式特征选择算法中常用的序列搜索策略相结合,提出了改进的序列前(后)向搜索特征选择算法SFS-LW(SBS-LW)。通过在2个UCI数据集上与传统的基于交叉验证的封装式特征选择算法进行3组对比实验,结果表明该改进特征选择方法具有与传统方法近似的分类精度,但在时间复杂度上则有数倍的改善。
The wrapper feature selection methods can achieve high classification accuracy, however, its cross-validation scheme in evaluation phase is very expensive in terms of computing resource consumption. In this paper, we propose a new statistical LW-measure which can replace the cross-validation scheme to evaluate feature sets. Furthermore, two improved wrapper algorithms, i.e. sequential forward selection-LW (SFS-LW) and sequential backward selection-LW (SBS-LW), are presented for feature selection, on the basis of combination of LW-measure and sequence search algorithms. Three groups of experiments conducted on two University of California, Irvine (UCI) datasets show that the proposed algorithms can not only obtain the similar classification accuracy to that of the traditional wrapper methods, but also are nearly ten times faster than the traditional ones.
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2016年第4期607-615,共9页
Journal of University of Electronic Science and Technology of China
基金
教育部-中国移动科研基金(MCM20130661)
计算机网络及应用四川省工程实验室基金(20160001)
关键词
特征选择
序列搜索算法
分类
时间复杂度
封装式方法
feature selection
sequence search algorithm
text classification
time complexity
wrapper methods