期刊文献+

融合分类信息的随机森林特征选择算法及应用 被引量:14

Random Forest Feature SelectionAlgorithm Based on Categorization Information andApplication
下载PDF
导出
摘要 针对传统随机森林随特征数增加计算消耗高的问题,提出了一种随机森林多特征置换算法。该算法对数据特征进行聚类,保持其他特征簇不变,逐一对同簇特征同时随机置换,得到全部特征簇的重要性得分及簇间排序。簇内特征按与分类信息的相关程度排序,引入相关性阈值选出重要特征,对剩余特征按先簇间、再簇内的规则进行排序。为了进一步比较该方法的有效性,基于K均值聚类、层次聚类、模糊C均值聚类算法,设计了三种随机森林多特征置换的特征选择算法。实验结果表明,与传统随机森林方法相比,新算法可选择较少特征时仍取得较高分类精度,且时间效率更高。 Aiming at the problem of calculating high consumption of traditional random forest with the increase of feature number,a multi-feature permutation algorithm by random forest is proposed.All of features are clustered firstly,then the features in the same cluster are taken random permutation as the other clusters remain unchanged.The importance of all the feature-clusters are calculated and ranked.The feature in the same cluster is ranked by the correlation of itself and classification information.A correlation threshold is used to choose the important features.The rule of ranking the remaining feature is first between clusters,then within clusters.To further illustrate the effectiveness of the method,three correspondingly multi-feature permutation algorithms by random forest are designed based on K-mean,hierarchical and fuzzy C-mean clustering algorithms.The experimental results show that the proposed algorithm achieves higher classification accuracy with fewer features and higher time efficiency compared with the traditional random forest method.
作者 武炜杰 张景祥 WU Weijie;ZHANG Jingxiang(School of Science,Jiangnan University,Wuxi,Jiangsu 214122,China)
机构地区 江南大学理学院
出处 《计算机工程与应用》 CSCD 北大核心 2021年第17期147-156,共10页 Computer Engineering and Applications
基金 国家自然科学基金(61772239,11804123)。
关键词 特征选择 聚类 随机森林 多特征置换 feature selection cluster random forest multi-feature permutation
  • 相关文献

参考文献9

二级参考文献84

  • 1林成德,彭国兰.随机森林在企业信用评估指标体系确定中的应用[J].厦门大学学报(自然科学版),2007,46(2):199-203. 被引量:37
  • 2Yu L,Liu H.Efficient feature selection via analysis of relevance and redundancy[J].Journal of Machine Learning Research,2004:1205-1224.
  • 3Zhang D,Chen S,Zhou Z.Constraint score:A new filter method for feature selection with pair-wise constraints[J].Pattern Recognition,2008,41:1440-1451.
  • 4Kohavi G,John H.Wrappers for feature subset selection[J].Artificial Intelligence,1997:273-324.
  • 5Guyon I,Elisseeff A.An introduction to variable and feature selection[J].Journal of Machine Learning Research,2003:1157-1182.
  • 6Swiniarski W,Skovaon A.Rough set methods in feature selection and recognition[J].Pattern Recognition Letters,2003:833-849.
  • 7Last M,Kandel A,Maimon O.Information-theoretic algorithm for feature selection[J].Pattern Recognition Letters,2001:799-811.
  • 8Dash M,Liu H,Yao J.Dimensionality reduction of unsupervised data[C] //Proc 9th IEEE Int'l Conf Tools with Artificial Intelligence,1997:532-539.
  • 9Mitra P,Murthy C A,Pal S K.Unsupervised feature selection using feature similarity[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002:301-312.
  • 10Covoes T F,Hruschka E R.A cluster-based feature selection approach[C] //LNCS 5572:HAIS2009,2009:69-176.

共引文献452

同被引文献140

引证文献14

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部