摘要
该文基于随机森林预测模型,提出RFECV特征选择方法:首先对特征变量进行独热编码,再利用RFECV内置的交叉验证评估各特征子集性能,以确定最佳特征数量,并递归消除低重要性特征。实验表明,该方法在随机森林上训练与预测更快,均方误差更低,特征提取准确率高。
Based on the random forest prediction model,this paper proposes the RFECV feature selection method:firstly,the feature variables are encoded with one-hot encoding,and then the built-in cross-validation of RFECV is used to evaluate the performance of each feature subset to determine the optimal number of features,and recursively eliminate low-importance features.Experiments show that this method achieves faster training and prediction on the random forest,lower mean squared error,and high accuracy in feature extraction.
作者
孙晶
SUN Jing(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030000,China)
出处
《数字通信世界》
2024年第9期114-116,共3页
Digital Communication World
关键词
随机森林预测模型
独热编码
递归特征消除
交叉验证
random forest prediction model
one-hot encoding
recursive feature elimination
cross-validation