期刊文献+

基于改进随机森林算法的共享单车需求量预测 被引量:1

Forecast of Shared Bicycle Demand Based on Improved Random Forest
下载PDF
导出
摘要 在预测共享单车需求量的问题上,随机森林算法与其他算法相比具有显著优势。然而在处理存在大量冗余数据的数据集方面,随机森林算法会导致过拟合。为此,论文提出一种基于随机森林的改进算法—FWRF算法,预测共享单车需求量。该算法首先利用相关系数对每个特征进行加权,然后将特征区间划分为高相关区间与低相关区间,让特征选择限制在特定范围,实现降低泛化误差的目标,增强算法的学习性能,提高算法的预测精度。最后,论文将FWRF算法应用到NewYork CityBike的公开数据集上,分析多维异构数据影响下共享单车需求量变化。与原有算法相比,在预测精度上提高了5.1345%,证明了该改进算法的有效性和可行性。 Random forest algorithms have significant advantages over other algorithms in predicting the demand for shared bicycles.However,random forest algorithms can lead to overfitting in dealing with data sets with large amounts of redundant data.To this end,this paper proposes an improved algorithm based on random forest-FWRF algorithm to predict the demand for shared bicycles.Firstly,each algorithm is weighted by correlation coefficient,and then the feature interval is divided into high correlation interval and low correlation interval,so that feature selection is limited to a specific range,the goal of reducing generalization error is achieved,the learning performance of the algorithm is enhanced,and the prediction accuracy of the algorithm is improved.Finally,this paper applies the FWRF algorithm to NewYork CityBike's public dataset to analyze the changes in shared bicycle demand under the influence of multi-dimensional heterogeneous data.Compared with the original algorithm,the prediction accuracy is improved by 5.1345%,which proves the effectiveness and feasibility of the improved algorithm.
作者 张徐 聂文惠 ZHANG Xu;NIE Wenhui(School of Computer Science and Telecommunication Engineering,Jiangsu University,Zhenjiang 212013)
出处 《计算机与数字工程》 2021年第9期1860-1865,共6页 Computer & Digital Engineering
关键词 随机森林 FWRF 相关系数 多维异构 需求预测 random forest FWRF correlation coefficient scores multidimensional heterogeneous demand forecast
  • 相关文献

参考文献5

二级参考文献51

  • 1张启蕊,张凌,董守斌,谭景华.训练集类别分布对文本分类的影响[J].清华大学学报(自然科学版),2005,45(S1):1802-1805. 被引量:27
  • 2Breiman L.Random forest[J].Machine Learning,2001,45 : 5-32.
  • 3Stolfo S .J Fan D W S,Lee W,et al.Credit card fraud detection using meta-learning:Issues~nd initial resuhs[C]//AAAI-97 Wrokshop on AI Methods in Fraud and Risk Mangement,1997.
  • 4Pednanlt E P D,Rosen B K,Apte C.Handling imbalanced data sets in insurance risk modeling,Technical Report RC-21731[R].IBM Research Report, 2000-03.
  • 5Batista G E A P A,Bazzan A L C.Balancing training data for automated annotation of keywords:A case study[C]//Proe of the Second Brazilian Workshop on Bioinformaties,SBC,2003.
  • 6Kubar M,Matwin S.Addressing the course of imbalanced training sets:One-sided selection[C]//Proceedings of 14th International Conference in Machine Learning,San Francisco,CA,1997:179-186.
  • 7Breiman L,Freidman J.Classification and regression trees [M].[S.l.]: Wadsworth, 1984.
  • 8Liu X Y,Wu J.Exploratory under-sampling for class-imbalance learning[C]//Proceedings of the 6th IEEE International Conference on Data Mining(ICDM'06),Hong Kong,China,2006.
  • 9Chawla N V,Bowyer K W.SMOTE:Synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16: 321-357.
  • 10Chen C,Liaw A,Breiman L.Using random forest to learn imbalanced data,Technical Report 666[R].Statistics Department,University of California at Berkeley, 2003.

共引文献53

同被引文献9

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部