摘要
文章提出了一种基于随机森林的加权特征选择算法WRFFS。算法以随机森林为基础,以分类精度作为筛选特征子集的标准,通过在数据集上构造多棵决策树,采用交叉验证的方式进行特征的重要性度量,各决策树的权重和特征重要性度量加权求和决定了最终的特征重要性排序,然后再采用序列后向选择法(Se-quential backward selection,SBS)进行特征的筛选,其中决策树的权重由该决策树与预测结果的相符程度来决定。最后,通过对比实验表明该方法WRFFS比已有文献中方法具有更好的分类性能。
This paper proposes a weighted feature selection algorithm (WFRRS) based on random forest. The algorithm is based on random forest algorithm with classification accuracy as criteria for filtering the subset of features. Firstly the paper constructs multiple decision trees in the data sets, and adopts the method of cross validation to feature the importance of measurement. The weight of each decision tree and features importance measure weighted sum, determine the final order of feature importance. And then the paper uses the sequential backward selection (SBS) methods for feature selection, in which the weight of the decision tree is determined by the consistency between the decision tree and the predicted result. Finally a contrast experiment on UCI data sets is done to show that WRFFS has better classification performance than the methods from the existing literature.
作者
徐少成
李东喜
Xu Shaocheng;Li Dongxi(School of Mathematics,Taiyuan University of Technology,Taiyuan 030024,China)
出处
《统计与决策》
CSSCI
北大核心
2018年第18期25-28,共4页
Statistics & Decision
基金
国家自然科学基金资助项目(11402157)
山西省回国留学人员科研资助项目(2015-032)
山西省高等学校科技创新项目(2015121)
山西省应用基础研究项目(2016021013)
关键词
高维数据
随机森林
加权特征选择
封装式
high-dimensional data
random forest
weighted feature selection
wrapper