期刊文献+

基于随机森林相似度矩阵差异性的特征选择 被引量:5

Feature selection of random forest-based proximity matrix difference
原文传递
导出
摘要 将随机森林的相似度矩阵看做一种特殊的核度量,利用该度量对模型参数的鲁棒性和特征变化的敏感性,提出一种特征选择的方法.采用相似度矩阵,计算训练样本类内和类间相似性比率.再利用特征值随机置换技术,将相似性比率的变化量作为特征重要性度量指标,从而对所有特征进行排序.试验结果表明,该方法能充分利用全部样本的信息,有效地进行特征选择,且其性能优于基于袋外数据误差率估计的特征选择方法. A feature selection method is proposed,after analyzing proximity matrix's to random forest model and its sensitiveness to the variation of features.Proximity matrix is taken as a special kernel measurement to compute the proximity ratio between inner-class and the inter-class,then permutes the values of feature randomly and the difference of proximity ratio was takes as the assessment criterion for feature importance.The process yields a ranking for all features.Experimental results show that the method achieves good effects and performs better than that of the method based on out-of-bag (OOB) error rate.
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2010年第4期58-61,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 福建省自然科学基金资助项目(2009J05153)
关键词 特征选择 度量 差异性 相似度矩阵 随机森林 随机置换 feature extraction measurements differentiation proximity matrix random forest random permutation
  • 相关文献

参考文献10

  • 1Blum A L, Langley P. Selection of relevant features and examples in machine learning[J]. Artificial Intelligence, 1997, 97(1-2): 245-271.
  • 2陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量:96
  • 3Langley P. Selection of relevant features in machine learning[C]//Proceedings of the AAAI Fall Symposium on Relevance. New Orleans.. AAAI Press 1994, 1-5.
  • 4Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines [J]. Machine Learning, 2002, 46(1): 389-422.
  • 5Xing E P, Feature selection in microarray analysis, in a practical approach to microarray data analysis[M]. Dordrecht: Kluwer Academic Publishers, 2002.
  • 6Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32.
  • 7Breiman L. Bagging Predictors[J]. Machine Learning, 1996, 24(2):123-140.
  • 8Breiman L, Friedman J H, Olshen R A, et al. Classification and regression trees[M]. Cole: Wadsworth & Brooks, 1984.
  • 9Ho T K. The random subspace method for constructing decision forests [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8): 832-844.
  • 10Hart P. The condensed nearest neighbor rule[J]. IEEE Transactions on Information Theory, 1968, 14(3): 515-516.

二级参考文献3

  • 1Wu X,A Heuristic Covering Algorithm for Extension Matrix Approach.Department of Artificial Intelligence,1992年
  • 2洪家荣,Proc Int Computer Science Conference’88, Hong Kong,1988年
  • 3洪家荣,Int Jnal of Computer and Information Science,1985年,14卷,6期,421页

共引文献95

同被引文献17

引证文献5

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部