摘要
ReliefF是公认的效果较好的filter式特征评估方法,但该方法一大缺点是不能辨别冗余特征.提出两种基于Relief的组合式特征选择算法:ReCorre和ReSBSW,这两种算法均首先利用ReliefF算法过滤掉无关特征,然后分别采用相关分析(Correlation)以及顺序后向搜索(SBS)的Wrapper算法去除冗余特征.在实际数据集以及人造数据集上进行了实验,分析比较了Relief,ReCorre以及ReSBSW算法的性能.实验结果得出如下结论:Reli efF方法对无关特征较多的数据集能够很好的降维,但对于实际数据中特征间关系较复杂的情况,只能去掉很少的无关特征,并会去除一部分相关特征,ReliefF不能处理冗余特征,ReCorre可以在ReliefF基础上去除大部分冗余特征.ReSBSW算法可得到较好的泛化性能,但算法计算量很高,不适合大规模数据集.
Relief is a feature evaluation method which performs well, while Relief cannot discriminate redundant features. It proposes two combination feature selection algorithm based on Relief: ReCorre and ReSBSW. The two algorithms both first use Relief to filter irrelevant features, then use correlation analysis and sequential backward search (SBS) in Wrapper form to remove redundant features,respectively. It makes experiments on real and artificial datasets, analyze and make comparison between Relief,ReCorre and ReSBSW. It gets the following conclusions: Relief can reduce dimension well on datasets with many irrelevant features, but can remove relatively few irrelevant features and may remove relevant features for real datasets with complex relationship among features. ReCorre can remove most of redundant features based on ReliefF, while ReSBSW can get better generalization performance with high computing, and is not fit to large-scale datasets.
出处
《复旦学报(自然科学版)》
CAS
CSCD
北大核心
2004年第5期893-898,共6页
Journal of Fudan University:Natural Science