期刊文献+

基于Relief的组合式特征选择 被引量:44

Combination Feature Selection Based on Relief
原文传递
导出
摘要 ReliefF是公认的效果较好的filter式特征评估方法,但该方法一大缺点是不能辨别冗余特征.提出两种基于Relief的组合式特征选择算法:ReCorre和ReSBSW,这两种算法均首先利用ReliefF算法过滤掉无关特征,然后分别采用相关分析(Correlation)以及顺序后向搜索(SBS)的Wrapper算法去除冗余特征.在实际数据集以及人造数据集上进行了实验,分析比较了Relief,ReCorre以及ReSBSW算法的性能.实验结果得出如下结论:Reli efF方法对无关特征较多的数据集能够很好的降维,但对于实际数据中特征间关系较复杂的情况,只能去掉很少的无关特征,并会去除一部分相关特征,ReliefF不能处理冗余特征,ReCorre可以在ReliefF基础上去除大部分冗余特征.ReSBSW算法可得到较好的泛化性能,但算法计算量很高,不适合大规模数据集. Relief is a feature evaluation method which performs well, while Relief cannot discriminate redundant features. It proposes two combination feature selection algorithm based on Relief: ReCorre and ReSBSW. The two algorithms both first use Relief to filter irrelevant features, then use correlation analysis and sequential backward search (SBS) in Wrapper form to remove redundant features,respectively. It makes experiments on real and artificial datasets, analyze and make comparison between Relief,ReCorre and ReSBSW. It gets the following conclusions: Relief can reduce dimension well on datasets with many irrelevant features, but can remove relatively few irrelevant features and may remove relevant features for real datasets with complex relationship among features. ReCorre can remove most of redundant features based on ReliefF, while ReSBSW can get better generalization performance with high computing, and is not fit to large-scale datasets.
出处 《复旦学报(自然科学版)》 CAS CSCD 北大核心 2004年第5期893-898,共6页 Journal of Fudan University:Natural Science
关键词 特征选择 算法 冗余 数据集 大规模数据 泛化性能 搜索 实际 结论 计算量 feature selection genetic algorithm ReliefF Wrapper large-scale dataset
  • 相关文献

参考文献10

  • 1Langley P. Selection of relevant features in machine learning [A].In: Greiner R,eds.Proc AAAI Fall Symposium on Relevance [C].New Orleans:AAAI Press,1994.140-144.
  • 2Kohavi R, John G. Wrappers for feature subset selection [J]. Artificial Intelligence, 1997, 97:273-324.
  • 3Almullim H, Dietterich T. Learning with many irrelevant features [A].Proceedings of Ninth National Conference on Artificial Intelligence [C]. New Orleans:AAAI Press,1991.547-552.
  • 4Liu H, Motoda H. Feature Selection for Knowledge Discovery and Data Mining [M]. Boston: Kluwer Academic Publishers, 1998. 62-68.
  • 5Kira K, Rendell L. The feature selection problem: Traditional methods and a new algorithm [A].Proceedings of the Ninth National conference on Artificial Intelligence [C]. New Orleans:AAAI Press,1992.129-134.
  • 6Kononenko I. Estimation attributes: Analysis and extensions of RELIEF [A].In:Bergadano F,De Raedt L,eds. Proceedings of the 1994 European Conference on Machine Learning [C]. Catania, Italy: Springer Verlag,1994.171-182.
  • 7Fayyad U M,Irani K B.Multi-interval discretisation of continuous-valued attributes for classification learning [A]. In: Chambéry F, Ruzena B,eds.Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence [C]. San Francisco:Mo
  • 8Breiman L, Friedman J H, Olshen R A,et al. Classification and regression trees [M]. Belmont, California:Wadsworth International Group,1984.43-49.
  • 9BIOwulf Technologies. NIPS 2001 workshop on Variable and Feature Selection [EB/OL]. http://www.clopinet.com/isabelle/ Projects/NIPS2001/.2001-12-06/2004-03-14.
  • 10Blake C L,Merz C J. UCI Repository of machine learning databases [EB/OL]. http://www.ics.uci.edu/-mlearn /MLRepository.html. 2003-12-12.

同被引文献446

引证文献44

二级引证文献180

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部