摘要
针对传统随机森林随特征数增加计算消耗高的问题,提出了一种随机森林多特征置换算法。该算法对数据特征进行聚类,保持其他特征簇不变,逐一对同簇特征同时随机置换,得到全部特征簇的重要性得分及簇间排序。簇内特征按与分类信息的相关程度排序,引入相关性阈值选出重要特征,对剩余特征按先簇间、再簇内的规则进行排序。为了进一步比较该方法的有效性,基于K均值聚类、层次聚类、模糊C均值聚类算法,设计了三种随机森林多特征置换的特征选择算法。实验结果表明,与传统随机森林方法相比,新算法可选择较少特征时仍取得较高分类精度,且时间效率更高。
Aiming at the problem of calculating high consumption of traditional random forest with the increase of feature number,a multi-feature permutation algorithm by random forest is proposed.All of features are clustered firstly,then the features in the same cluster are taken random permutation as the other clusters remain unchanged.The importance of all the feature-clusters are calculated and ranked.The feature in the same cluster is ranked by the correlation of itself and classification information.A correlation threshold is used to choose the important features.The rule of ranking the remaining feature is first between clusters,then within clusters.To further illustrate the effectiveness of the method,three correspondingly multi-feature permutation algorithms by random forest are designed based on K-mean,hierarchical and fuzzy C-mean clustering algorithms.The experimental results show that the proposed algorithm achieves higher classification accuracy with fewer features and higher time efficiency compared with the traditional random forest method.
作者
武炜杰
张景祥
WU Weijie;ZHANG Jingxiang(School of Science,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处
《计算机工程与应用》
CSCD
北大核心
2021年第17期147-156,共10页
Computer Engineering and Applications
基金
国家自然科学基金(61772239,11804123)。
关键词
特征选择
聚类
随机森林
多特征置换
feature selection
cluster
random forest
multi-feature permutation