摘要
将随机森林的相似度矩阵看做一种特殊的核度量,利用该度量对模型参数的鲁棒性和特征变化的敏感性,提出一种特征选择的方法.采用相似度矩阵,计算训练样本类内和类间相似性比率.再利用特征值随机置换技术,将相似性比率的变化量作为特征重要性度量指标,从而对所有特征进行排序.试验结果表明,该方法能充分利用全部样本的信息,有效地进行特征选择,且其性能优于基于袋外数据误差率估计的特征选择方法.
A feature selection method is proposed,after analyzing proximity matrix's to random forest model and its sensitiveness to the variation of features.Proximity matrix is taken as a special kernel measurement to compute the proximity ratio between inner-class and the inter-class,then permutes the values of feature randomly and the difference of proximity ratio was takes as the assessment criterion for feature importance.The process yields a ranking for all features.Experimental results show that the method achieves good effects and performs better than that of the method based on out-of-bag (OOB) error rate.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2010年第4期58-61,共4页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
福建省自然科学基金资助项目(2009J05153)
关键词
特征选择
度量
差异性
相似度矩阵
随机森林
随机置换
feature extraction
measurements
differentiation
proximity matrix
random forest
random permutation