摘要
为解决单一特征选择方法的局限性问题,提出Lasso-RF(LRF)混合特征选择方法,并应用于在线短租房源价格问题研究。基于Airbnb房源数据,实验首先通过Lasso回归进行特征选择,处理特征之间的多重共线性;然后采用随机森林算法精选剩余特征,最终得到35个重要特征,并带入4个预测模型中进行比较。结果表明,特征之间的多重共线性会影响随机森林算法对特征重要度的度量;LRF-RF预测模型与RF-RF预测模型相比,评价指标R2和MSE分别提高了0.005、0.006,同时运行时间缩短0.267秒,表明LRF混合特征选择方法优于单一的RF特征选择方法。
To solve the problem of the limitation of single feature selection method,a mixed feature selection method for Lasso-RF(LRF)is proposed,and is applied to the listings price of home-sharing accommodation.Based on the data of Airbnb,the experiment does the feature selection by Lasso regression firstly,dealing with the multicollinearity between features.Then the experiment selects the residual features by Random forest.Finally,35 important features are selected out and used in four prediction models in order to evaluate and compare the results.The results show that the multicollinearity between the features will affect the measurement of the im⁃portance of the random forest to the features.Comparison between LRF-RF prediction model and RF-RF prediction model shows that evaluation indexes R2 and MSE was increased by 0.005 and 0.006 respectively,and the running time was reduced by 0.267 seconds.The evaluation result show that LRF hybrid feature selection method is better than single RF feature selection method.
作者
张浩
朱晨龙
ZHANG Hao;ZHU Chen-long(College of Economics and Management,Jiangsu University of Science and Technology,Zhenjiang 212000,China)
出处
《软件导刊》
2020年第8期1-5,共5页
Software Guide
基金
国家自然科学基金重点项目(71331003)。
关键词
特征选择
Lasso
随机森林
在线短租
房源价格
feature selection
Lasso
random forest
home-sharing accommodation
listings price