摘要
在预测共享单车需求量的问题上,随机森林算法与其他算法相比具有显著优势。然而在处理存在大量冗余数据的数据集方面,随机森林算法会导致过拟合。为此,论文提出一种基于随机森林的改进算法—FWRF算法,预测共享单车需求量。该算法首先利用相关系数对每个特征进行加权,然后将特征区间划分为高相关区间与低相关区间,让特征选择限制在特定范围,实现降低泛化误差的目标,增强算法的学习性能,提高算法的预测精度。最后,论文将FWRF算法应用到NewYork CityBike的公开数据集上,分析多维异构数据影响下共享单车需求量变化。与原有算法相比,在预测精度上提高了5.1345%,证明了该改进算法的有效性和可行性。
Random forest algorithms have significant advantages over other algorithms in predicting the demand for shared bicycles.However,random forest algorithms can lead to overfitting in dealing with data sets with large amounts of redundant data.To this end,this paper proposes an improved algorithm based on random forest-FWRF algorithm to predict the demand for shared bicycles.Firstly,each algorithm is weighted by correlation coefficient,and then the feature interval is divided into high correlation interval and low correlation interval,so that feature selection is limited to a specific range,the goal of reducing generalization error is achieved,the learning performance of the algorithm is enhanced,and the prediction accuracy of the algorithm is improved.Finally,this paper applies the FWRF algorithm to NewYork CityBike's public dataset to analyze the changes in shared bicycle demand under the influence of multi-dimensional heterogeneous data.Compared with the original algorithm,the prediction accuracy is improved by 5.1345%,which proves the effectiveness and feasibility of the improved algorithm.
作者
张徐
聂文惠
ZHANG Xu;NIE Wenhui(School of Computer Science and Telecommunication Engineering,Jiangsu University,Zhenjiang 212013)
出处
《计算机与数字工程》
2021年第9期1860-1865,共6页
Computer & Digital Engineering
关键词
随机森林
FWRF
相关系数
多维异构
需求预测
random forest
FWRF
correlation coefficient scores
multidimensional heterogeneous
demand forecast